feat(tasks): clean-sheet rebuild spec for the async tasks plugin#208
Merged
Conversation
Adds specs/tasks-async-system.md as the active design for the bundled tasks plugin. Supersedes ORA-165 (six task types, Y.Doc pages, 700-line sub-agent prompt) and the TASK-31 port-as-is plan. Both prior docs gain a SUPERSEDED header pointing at the new spec. The new design collapses tasks to one primitive (TaskSpec as YAML+markdown), puts all storage behind a TaskSpecStore port (Redis today, filesystem when the runtime grows the API), keeps the approval gate entirely in the plugin (replacing the deleted tryHandleApprovalResponse plumbing), and ships preview-before-save plus conditional dedicated rooms in MVP.
Amends specs/tasks-async-system.md with three simplifications driven by review: - Storage port reframed as a 4-method filesystem (TaskFs: read/write/delete/ list) with paths under /users/<did>/tasks/<id>/. Redis adapter today; a UcanFsAdapter (same auth model as sandbox) when the runtime grows the API. Ephemeral state (run locks, pending approvals, BullMQ) stays on direct Redis and never goes through the port. - The worker no longer rebuilds RuntimeContext. It calls MessagesService.invokeForAutomation — a single thin entry point on the runtime — which reuses every existing agent path (auth, credits middleware, capability gating, checkpointer). BackgroundContextBuilder and the attendant DI of UCAN/Secrets/Sessions/Matrix/AMBIENT are deleted. - Webhook + chain.after triggers deferred out of this rebuild to a follow-up. MVP ships time.once + time.cron only. Phase 1 effort drops from 6d to 5d; total from 8.5d to ~7d.
…istory Three MVP cuts driven by review: - No REST surface. Tasks are agent-managed; the 6 HTTP endpoints (list/get/ patch/pause/resume/cancel) and the controllers folder are gone. Re-add later only when a non-agent consumer wants it. (FOLLOWUP-5.) - Per-run cost/token meter pushed to "good to have". The credits middleware still deducts, but surfacing the numbers from MessagesService.invokeFor- Automation hasn't been verified, so the preview output schema and the AutomationResult shape drop the tokens/costUsd fields for now. (FOLLOWUP-2.) - Per-task monthly budget enforcement pushed (depends on the cost meter). budget.monthlyUsd removed from the TaskSpec frontmatter and example. (FOLLOWUP-4.) - Run history saving pushed. No runs.jsonl on disk in MVP. Matrix room messages are the natural run log; get_task returns a room link plus the last-error block. RunSummary type removed from folder layout. (FOLLOWUP-3.) Self-healing still works in MVP via a Redis failure counter (tasks:failures:<id> hash with count + lastError + lastFailedAt) — reset on success, drives the failed-pending-review threshold and feeds suggest_spec_fix. Sections 14 (REST) and 17 (Cost accounting) deleted; remaining sections renumbered (15→14, 16→15, 18→16, 19→17, 20→18, 21→19, 22→20). Phase 1 effort drops from 5d to 4d; total from ~7d to ~5.5d.
Replaces the BUNDLED_PLUGINS stub with a real TasksPlugin that ships: - TaskSpec — YAML frontmatter + markdown body; round-trips via gray-matter, validated by Zod. Stored at /users/<did>/tasks/<id>/spec.md. - TaskFs port — 4 methods (read / write / delete / list). RedisTaskFs is the only implementation today; UcanFsAdapter (FOLLOWUP-1) is the planned swap once the runtime grows a per-user UCAN filesystem. - EphemeralStateService — direct ioredis for run locks, failure counters (count + lastError + lastFailedAt), preview tokens, and pending approval payloads. Never goes through TaskFs because none of it is user-readable. - SchedulerService + 2 BullMQ queues — `task_run` (5 concurrency, 3 retries exponential) and `task_approval` (10 concurrency, fixed backoff). Cron next-fire computed via cron-parser. - TaskRunWorker — loads spec, calls MessagesService.sendMessage with msgFromMatrixRoom=true, hands result to delivery or approval gate. No parallel buildRuntimeContext, no createMainAgent call. - ApprovalService — owns request / approve / reject / handleTimeout, exposes APPROVAL_GATE_PORT for cross-plugin reuse. - ApprovalGateMiddleware — pre-LLM intercept with fast keyword path → cheap LLM fallback. Re-enqueues the resolution as a BullMQ job so the worker owns the post-message work. Replaces the deleted tryHandleApproval- Response plumbing with zero edits to MessagesService. - IntentClassifier — fast path covers yes/no/ok/cancel/approve/reject and short-prefix variants; slow path is utility-tier LLM with a 1-word return. - RoomResolver + DedicatedRoomService — heuristic decides whether a task gets its own [Task] <title> room (sub-day cron, webhook/chain, "monitor" intents, or explicit). Spec.md posted as opening message. - AutomationInvoker — single call into MessagesService; resolves the user's main room via MatrixManager.getOracleRoomIdWithHomeServer. - 9 main-agent tools — preview_task (with required previewToken handshake), create_task, list_my_tasks, get_task, update_task, pause/resume/cancel _task, suggest_spec_fix. - 3 starter templates — morning-brief, url-monitor, weekly-report. - 20 unit tests covering manifest, auto-detect, configSchema, tool set, spec round-trip, schema validation, paths, hashing, trigger summary, intent classifier fast path, and room resolver truth table. Deps added to oracle-runtime: cron-parser, gray-matter. `apps/app/src/tasks/` is NOT removed in this change — that's coupled to the broader TASK-32 apps/app deletion. The new plugin lives independently.
… plumbing
Full review of the first implementation found a fatal bug plus systemic
over-engineering, so this is a clean rewrite (24 internal files → 14).
The fatal bug: the worker invoked the agent with a synthetic sessionId,
but RequestPreparer.prepare() throws NotFoundException for unknown
sessions — every task run would have failed, pushing every task to
failed-pending-review. Fixed properly: each task now owns ONE real
session (SessionManagerService.createSession, anchored in the task's
dedicated room when there is one), stored as `sessionId` in the spec
frontmatter. Every run continues that LangGraph thread, which also gives
tasks memory across runs ("compare to what you reported yesterday") via
the checkpointer, for free. No new MessagesService surface needed.
Other v1 problems fixed by the rebuild:
- Dedicated rooms never worked: matrix-bot-sdk's createRoom returns the
room id string, and v1 cast the result to `{room_id?}` — always
undefined. (The cast hid it; removed.)
- Failure counting was per BullMQ attempt, so one flaky run (3 retries)
hit the 3-failure threshold instantly. Now @OnWorkerEvent('failed')
records one failure per exhausted run and reschedules the next cron
occurrence so transient failures don't kill the task.
- The middleware created its own Redis client + Queue via a module-level
side-channel map, and resolution took 3 hops (middleware → BullMQ
"resolve" job → worker → service). Now: the Nest module hands its wired
service bundle back to the plugin instance via onReady; the middleware
shares those services and resolves approvals in one call.
- The "cheap LLM classifier" slow path was dead code — never wired. The
design is now honest: keyword fast path resolves yes/no without any
model call (wrapModelCall returns the ack AIMessage directly); nuanced
replies go to the main agent with a per-call system-prompt hint and a
new resolve_pending_approval tool. The agent IS the classifier.
- Dead config and decorative fields removed: TASKS_DEFAULT_TIMEZONE
(never read), modelTier (sendMessage has no model override — storing
it would lie), delivery.format (the body's "How to report" section is
the contract), draft status (nothing ever was one), preview expiresAt
(redundant with Redis TTL). TASKS_MAX_PER_USER is now actually
enforced in create_task.
- Redis key names, spec parse/render, and failed-pending-review
transitions each live in exactly one place now (RedisState, TaskStore,
and the two services that own them) instead of being duplicated across
files.
- An 8-key service-locator registry is gone; preview tokens are single-
use (GETDEL); run job ids are per-fire-time (taskId:runAtIso) so
re-enqueues are idempotent without fighting BullMQ's completed-job
id memory.
Tools go from 9 to 10 (resolve_pending_approval). Tests go from 20 to 29
and now cover TaskStore against an in-memory TaskFs, nextRunAtFor, and
the classifier table. specs/tasks-async-system.md updated to match the
implemented design (frontmatter, approval flow, worker shape, folder
layout, follow-ups).
Full end-to-end trace of every code path turned up six real bugs. Each fix below is independently scoped; together they fall out of one architectural change. Critical: 1. **Preview pollutes the user's main Matrix room.** preview_task created a session via SessionManagerService.createSession, which posts "New Conversation Started" to the resolved room. With no roomId passed, that's the user's main oracle room. Every preview leaked. 2. **Tasks without a dedicated room appear in the user's session list.** The persistent task session was anchored in the main room when the user opted out of a dedicated room, so SessionsService.listSessions surfaced it. 3. **Dedicated room posted with sessionId: ''.** create_task built the spec, created the dedicated room (posting renderSpec(spec)), THEN filled sessionId. Empty string leaked into the room. Fix for 1/2/3: every run (and every preview) uses AgentInvoker.runOnce, which creates a synthetic session with overrideEventId (skips the Matrix post per SessionManagerService.createSession's `??` short-circuit), runs one agent turn, deletes the session. No persistent task session, no sessionId in the spec frontmatter. The user's main room and session list stay clean. We lose "memory across runs" — accept as a non-feature for MVP (file as FOLLOWUP if ever needed). 4. **Preview token consumed before validation.** consumePreview was a destructive GETDEL called before the task-limit / next-run checks. A user hitting the limit lost their token. Split into peekPreview (non-destructive) + deletePreview (destructive); peek up-front, delete only after the spec is saved and the run enqueued. 5. **delivery.post swallowed errors.** On Matrix outage, the worker thought the run succeeded — resetFailures, schedule next cron — while the user never saw the result. post now throws on failure; the worker bubbles it as a run failure (BullMQ retries; consecutive counter advances on the final attempt). Non-critical posts (approval prompts, reminders, failed-pending-review notice) call delivery.safePost, which keeps the swallow-and-log behaviour where surrounding state shouldn't roll back. 6. **claimApprovalResolution TTL was 5 min while approvals last 49 h.** A crash between claim and clear let the claim expire long before the pending payload — a later resolver could double-deliver. TTL is now passed in; takePending uses APPROVAL_TTL_SEC. Minor: - get_task now returns a curated subset instead of spreading all frontmatter (drops owner and raw trigger). - Dedicated-room post is a friendly markdown summary (title, trigger, approval, body) instead of raw YAML frontmatter. All 654 tests still pass; typecheck and lint clean. Spec updated to match: the frontmatter schema drops sessionId, §10 documents the synthetic-session approach, the glossary swaps "Task session" for "runOnce".
36c99b1 to
c4fb104
Compare
- Introduced `synthetic-session.ts` and `synthetic-session.test.ts` to manage synthetic session IDs for background tasks. - Updated `MessagesService` to handle synthetic session IDs correctly, ensuring they do not interfere with thread relations and post-sync operations. - Enhanced the `TasksPlugin` to include a new approval flow, allowing tasks to wait for user approval before executing actions. - Added middleware for task approval, enabling fast classification of user replies and integration with the approval flow. - Updated task specifications to support the new approval mechanism and ensure proper handling of task states. - Refactored related components to streamline the task execution process and improve overall reliability. All tests have been updated to reflect these changes and ensure functionality remains intact.
- Updated tests for `acquireToolLock` to assert that releasing locks does not throw exceptions. - Added checks for independent lock keys to ensure isolation between different keys. - Improved overall test coverage for lock functionality.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds specs/tasks-async-system.md as the active design for the bundled tasks
plugin. Supersedes ORA-165 (six task types, Y.Doc pages, 700-line sub-agent
prompt) and the TASK-31 port-as-is plan. Both prior docs gain a SUPERSEDED
header pointing at the new spec.
The new design collapses tasks to one primitive (TaskSpec as YAML+markdown),
puts all storage behind a TaskSpecStore port (Redis today, filesystem when
the runtime grows the API), keeps the approval gate entirely in the plugin
(replacing the deleted tryHandleApprovalResponse plumbing), and ships
preview-before-save plus conditional dedicated rooms in MVP.