🐛 fix(v1.5): rc.27 — agent-hosted update operationId threading + container snapshot (#289) by s-b-e-n-s-o-n · Pull Request #393 · CodesWhat/drydock

s-b-e-n-s-o-n · 2026-05-23T12:02:46Z

Closes the long-running same-name / cross-host update-state regression on #289. Two coordinated fixes on top of rc.26:

1. Thread the agent's container snapshot through every level of the agent-scoped operation pipeline (`a7f633b7`)

Cross-host updates silently dropped notification triggers (Pushover, Telegram, etc.) and produced incomplete SSE toasts for containers running on a connected agent. The controller built the agent-scoped operation row from buildAgentOperationBase without the container snapshot, so when markOperationTerminal fired emitTerminalLifecycleEvent, the resulting emitContainerUpdateApplied / emitContainerUpdateFailed carried only {operationId, containerName, containerId}. The notification trigger handler then fell back to findContainerByBusinessId, which compares the agent's bare containerName (tautulli) against the controller's fullName (mediavault_docker_tautulli) and silently dropped — the same class of findContainerByBusinessId miss as #385 but on the agent-scoped operation path.

Threads container through buildAgentOperationBase / ensureAgentOperationForTerminal / markAgentOperationTerminal / maybeMarkAgentOperationSucceededFromAppliedPayload / maybeMarkAgentOperationFailedFromFailedPayload, stamping agent: this.name. Existing-row race (dd:update-operation-changed before dd:update-applied) patches container onto the active row via updateOperation before the terminal emit. Adds container to MutableUpdateOperationFields so patch types accept it.

2. Thread the controller's `operationId` end-to-end so agent-hosted updates use exactly one row (`d38ea2c7`)

Even with fix #1, a follow-up rc.25 video showed a [mediavault] Container Tautulli update failed — Marked failed after exceeding active update TTL (1800000ms) while queued. Pushover arriving ~30 minutes after a successful update. Cause: createAcceptedContainerUpdateRequest mints a controller-side queued row with a crypto.randomUUID() operationId, but AgentTrigger.trigger(container) discarded the runtimeContext so the operationId never reached the agent. The agent minted its own UUID and fired lifecycle events keyed by that id, which the controller routed through toAgentScopedId into a third (agent-scoped) row. The original controller-side queued row was therefore never touched, sat past UPDATE_OPERATION_ACTIVE_TTL_MS, and was force-failed by the TTL sweep — which fired the misleading "failed" notification with the row's still-valid container snapshot.

Threads the controller operationId end-to-end:

AgentTrigger.trigger / triggerBatch accept and forward runtimeContext
AgentClient.runRemoteTrigger / runRemoteTriggerBatch extract per-container operationIds via getRequestedOperationId and include them in the agent payload ({id, name, operationId} single; {...container, operationId} per entry for batches)
app/api/trigger.ts runTrigger accepts an operationId in the body (validated by triggerRequestBodySchema) and threads it into requestContainerUpdate
app/agent/api/trigger.ts runTriggerBatch extracts per-container operationIds into {operationIds} runtimeContext before forwarding to the local trigger
EnqueueContainerUpdateOptions gains an operationId field honored by createAcceptedContainerUpdateRequest (single-container batches only; multi-container batches still mint per-container UUIDs)
New AgentClient.resolveAgentOperationId checks the controller's operation store for an existing row at the raw (unscoped) id and reuses it when found, falling back to toAgentScopedId when the agent does not echo a known controller id (backwards-compat with older agents)

The controller-side queued row therefore transitions directly to in-progress and succeeded/failed from the agent's lifecycle events, no parallel agent-scoped row is created, the TTL sweep has nothing stale to fail, and the spurious "update failed" notification disappears.

Tests / coverage

9903 app tests pass with 100% coverage (statements / branches / functions / lines)
New tests cover: AgentTrigger runtimeContext forwarding, runRemoteTrigger/Batch operationId payload, agent runTrigger/runTriggerBatch operationId extraction, request-update operationId honoring (single-container only), and AgentClient.resolveAgentOperationId reuse-vs-scope behavior
Build clean, lint clean (no new biome findings)

Fixes: #289

…perations Cross-host updates (#289) silently dropped notification triggers (Pushover) and produced incomplete SSE toasts for containers running on a connected agent. The controller built the agent-scoped operation row from buildAgentOperationBase without the container snapshot, so when markOperationTerminal fired emitTerminalLifecycleEvent, the resulting emitContainerUpdateApplied / emitContainerUpdateFailed carried only {operationId, containerName, containerId}. The trigger handler then fell back to findContainerByBusinessId, which compares the agent's bare containerName (`tautulli`) against the controller's fullName (`mediavault_docker_tautulli`) and silently dropped. Threads container through buildAgentOperationBase / ensureAgentOperationForTerminal / markAgentOperationTerminal / maybeMarkAgentOperationSucceededFromAppliedPayload / maybeMarkAgentOperationFailedFromFailedPayload, stamping agent: this.name. Existing-row race (dd:update-operation-changed before dd:update-applied) patches container onto the active row via updateOperation before the terminal emit. Adds container to MutableUpdateOperationFields so patch types accept it. Fixes: #289

…er Unreleased Adds an Unreleased entry for the agent-scoped update-operation container threading fix shipped in a7f633b so the rc.27 release notes pick it up alongside the rc.26 batch.

…d updates use one row Agent-hosted container updates (#289 follow-up) silently spawned a second "update failed" Pushover ~30 min after a successful update because the controller-side queued row was orphaned. createAcceptedContainerUpdateRequest minted a queued row with a controller operationId, but AgentTrigger.trigger() discarded the runtimeContext, so AgentClient.runRemoteTrigger posted {id,name} without an operationId. The agent then minted its own UUID and fired lifecycle events keyed by that UUID, which the controller scoped via toAgentScopedId into a third (agent-scoped) row. The original controller-side queued row was never touched, sat past UPDATE_OPERATION_ACTIVE_TTL_MS, and was force-failed by the TTL sweep with "Marked failed... while queued" — which fired a misleading failure notification on a successful update. Thread the controller operationId end-to-end so one row owns the lifecycle: - AgentTrigger.trigger / triggerBatch accept and forward runtimeContext. - AgentClient.runRemoteTrigger / runRemoteTriggerBatch extract per-container operationIds via getRequestedOperationId and include them in the agent payload ({id,name,operationId} single; {...container,operationId} batches). - app/api/trigger.ts runTrigger accepts operationId in the body (validated by triggerRequestBodySchema) and threads it into requestContainerUpdate. - app/agent/api/trigger.ts runTriggerBatch extracts per-container operationIds into {operationIds} runtimeContext before forwarding to the local trigger. - EnqueueContainerUpdateOptions gains an operationId field honored by createAcceptedContainerUpdateRequest (single-container batches only; multi-container batches still mint per-container UUIDs). - AgentClient.resolveAgentOperationId checks the controller's operation store for an existing row at the raw (unscoped) id and reuses it when found, falling back to toAgentScopedId when the agent does not echo a known controller id (backwards-compat with older agents). 9903 tests pass, 100% coverage on app. Adds targeted tests for AgentTrigger runtimeContext forwarding, runRemoteTrigger/Batch operationId payload, agent runTrigger/runTriggerBatch operationId extraction, request-update operationId honoring, and AgentClient.resolveAgentOperationId reuse-vs-scope behavior. Fixes: #289

vercel · 2026-05-23T12:02:52Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
drydock-demo	Ready	Preview, Comment	May 23, 2026 2:32pm
drydock-website	Ready	Preview, Comment	May 23, 2026 2:32pm

biggest-littlest

Approving — closes the two #289 regressions: (1) agent container snapshot threading so notification triggers resolve agent-hosted containers; (2) controller operationId threaded end-to-end so the orphaned queued row + spurious TTL-failure notification is eliminated. 100% coverage held.

biggest-littlest

Approving — closes the two #289 regressions: (1) agent container snapshot threading so notification triggers resolve agent-hosted containers; (2) controller operationId threaded end-to-end so the orphaned queued row + spurious TTL-failure notification is eliminated. 100% coverage held.

ALARGECOMPANY

Approving — operationId threading is clean (single source of truth for the lifecycle row), and the backwards-compat fallback in resolveAgentOperationId preserves older-agent behavior. Coverage held.

ALARGECOMPANY

Approving — operationId threading is clean (single source of truth for the lifecycle row), and the backwards-compat fallback in resolveAgentOperationId preserves older-agent behavior. Coverage held.

codecov · 2026-05-23T12:13:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

CodeQL flagged the new per-container operationIds map in app/agent/api/trigger.ts:runTriggerBatch as a prototype-pollution risk: container.id arrives in the request body and was used directly as a property name on a plain {} (high severity, js/remote-property-injection). Replace the map with an Object.create(null) instance and reject container.id values that are non-string, empty, or one of __proto__ / constructor / prototype before using them as property names. Adds two targeted tests covering the forbidden-key and non-string-id branches. Tests: 9905 pass, 100% coverage held.

biggest-littlest

Re-approving after CodeQL fix (af82937) — null-prototype guard + forbidden-key validation on the new operationIds map closes the js/remote-property-injection finding. Added tests cover the forbidden-key and non-string-id branches. Coverage held.

biggest-littlest

Re-approving after CodeQL fix (af82937) — null-prototype guard + forbidden-key validation on the new operationIds map closes the js/remote-property-injection finding. Added tests cover the forbidden-key and non-string-id branches. Coverage held.

ALARGECOMPANY

Re-approving after CodeQL fix — null-prototype + forbidden-key guard is the right pattern. Tests cover the new branches.

ALARGECOMPANY

Re-approving after CodeQL fix — null-prototype + forbidden-key guard is the right pattern. Tests cover the new branches.

Replace the null-prototype object with a Map in app/agent/api/trigger.ts:runTriggerBatch. CodeQL's js/remote-property-injection query treats obj[userKey] = ... as a PropWrite sink regardless of whether obj has Object.prototype on its chain — Object.create(null) does not satisfy the query. Per CodeQL's own QHelp ('This case should be avoided whenever possible by using the ECMAScript 2015 Map instead'), Map writes are simply not recognized as a sink, so this is the canonical mitigation. Consumer (app/triggers/providers/docker/update-runtime-context.ts) now accepts Map | Record for operationIds, branches on instanceof Map for the Map path, and uses Object.hasOwn + Reflect.get for the legacy Record path as defense-in-depth (the legacy producer in request-update.ts mints UUIDs internally, but the guard costs nothing and locks the lookup to own keys). Tests: 9907 pass, 100% coverage. Adds explicit prototype-pollution assertion (Object.prototype must remain unmodified after a malicious __proto__/constructor/prototype payload).

biggest-littlest

Re-approving after CodeQL Map fix (b76e06d). Per CodeQL's own QHelp for js/remote-property-injection, Map is the canonical mitigation — Map writes aren't recognized as a sink. Consumer accepts Map | Record with Object.hasOwn + Reflect.get on the Record path as defense-in-depth. Coverage 100%.

biggest-littlest

Re-approving after CodeQL Map fix (b76e06d). Per CodeQL's own QHelp for js/remote-property-injection, Map is the canonical mitigation — Map writes aren't recognized as a sink. Consumer accepts Map | Record with Object.hasOwn + Reflect.get on the Record path as defense-in-depth. Coverage 100%.

ALARGECOMPANY

Re-approving — Map is the right call per CodeQL docs. Defense-in-depth on the legacy Record path is a nice bonus.

) The previous '__proto__: string' construction in the prototype-chain test for operationIds triggered CodeQL's js/invalid-prototype-value rule and didn't actually exercise the path it claimed to (string values at __proto__ in an object literal are silently ignored). Replace with a regular Object.create(prototypeWithDataProp) so the consumer's hasOwn guard is genuinely exercised against an inherited data property. Coverage held at 100%.

…op Trivy banner - Roll v1.4.0–1.4.4 into a single v1.4.x row and trim per-version prose; the README roadmap is a summary, CHANGELOG.md owns the detail. - Wrap Recent Updates and Roadmap in <details> so the long lists are collapsed by default; centered h2 preserved inside <summary>. - Drop the Trivy supply-chain advisory blockquote at the top — drydock was confirmed unaffected (Actions not used, binary pinned to v0.69.3, SHA-pinned workflows), enough time has passed, and the full advisory remains at /security/trivy-supply-chain-march-2026.

biggest-littlest

Re-approving after rc.27 test-fix + README polish. CodeQL js/invalid-prototype-value cleared by 6a9b9e6 (replace the __proto__: 'evil-proto' object-literal — which is silently dropped by the engine — with Object.create(prototypeWithDataProp) so the consumer's Object.hasOwn guard is genuinely exercised against an inherited data property). 6c4fce6 is docs-only README cleanup. Coverage held at 100%.

ALARGECOMPANY

Re-approving — fix is the right shape (Object.create(prototypeWithDataProp) actually exercises the hasOwn guard against an inherited data property; the previous __proto__: 'evil-proto' literal was silently dropped). Docs-only README polish on top. 100% coverage held.

s-b-e-n-s-o-n added 3 commits May 22, 2026 20:32

📝 docs(changelog): record #289 cross-host update-notification fix und…

2132183

…er Unreleased Adds an Unreleased entry for the agent-scoped update-operation container threading fix shipped in a7f633b so the rc.27 release notes pick it up alongside the rc.26 batch.

s-b-e-n-s-o-n requested review from ALARGECOMPANY and biggest-littlest as code owners May 23, 2026 12:02

s-b-e-n-s-o-n temporarily deployed to ci-codecov May 23, 2026 12:02 — with GitHub Actions Inactive

biggest-littlest approved these changes May 23, 2026

View reviewed changes

biggest-littlest previously approved these changes May 23, 2026

View reviewed changes

ALARGECOMPANY approved these changes May 23, 2026

View reviewed changes

ALARGECOMPANY previously approved these changes May 23, 2026

View reviewed changes

vercel Bot deployed to Preview – drydock-website May 23, 2026 12:04 View deployment

github-advanced-security AI found potential problems May 23, 2026

View reviewed changes

Comment thread app/agent/api/trigger.ts Fixed

s-b-e-n-s-o-n dismissed stale reviews from ALARGECOMPANY and biggest-littlest via af82937 May 23, 2026 12:31

s-b-e-n-s-o-n temporarily deployed to ci-codecov May 23, 2026 12:31 — with GitHub Actions Inactive

vercel Bot deployed to Preview – drydock-demo May 23, 2026 12:31 View deployment

vercel Bot deployed to Preview – drydock-website May 23, 2026 12:33 View deployment

biggest-littlest approved these changes May 23, 2026

View reviewed changes

biggest-littlest previously approved these changes May 23, 2026

View reviewed changes

ALARGECOMPANY approved these changes May 23, 2026

View reviewed changes

ALARGECOMPANY previously approved these changes May 23, 2026

View reviewed changes

github-advanced-security AI found potential problems May 23, 2026

View reviewed changes

Comment thread app/agent/api/trigger.ts Fixed

s-b-e-n-s-o-n dismissed stale reviews from ALARGECOMPANY and biggest-littlest via b76e06d May 23, 2026 13:27

s-b-e-n-s-o-n temporarily deployed to ci-codecov May 23, 2026 13:27 — with GitHub Actions Inactive

biggest-littlest approved these changes May 23, 2026

View reviewed changes

biggest-littlest previously approved these changes May 23, 2026

View reviewed changes

ALARGECOMPANY previously approved these changes May 23, 2026

View reviewed changes

vercel Bot deployed to Preview – drydock-demo May 23, 2026 13:28 View deployment

vercel Bot deployed to Preview – drydock-website May 23, 2026 13:29 View deployment

github-advanced-security AI found potential problems May 23, 2026

View reviewed changes

Comment thread app/triggers/providers/docker/update-runtime-context.test.ts Fixed

s-b-e-n-s-o-n added 2 commits May 23, 2026 10:10

s-b-e-n-s-o-n dismissed stale reviews from ALARGECOMPANY and biggest-littlest via 6c4fce6 May 23, 2026 14:30

s-b-e-n-s-o-n temporarily deployed to ci-codecov May 23, 2026 14:31 — with GitHub Actions Inactive

vercel Bot deployed to Preview – drydock-website May 23, 2026 14:32 View deployment

vercel Bot deployed to Preview – drydock-demo May 23, 2026 14:32 View deployment

biggest-littlest approved these changes May 23, 2026

View reviewed changes

ALARGECOMPANY approved these changes May 23, 2026

View reviewed changes

s-b-e-n-s-o-n merged commit 74de57c into main May 23, 2026
25 checks passed

s-b-e-n-s-o-n deleted the feature/v1.5-rc27 branch May 23, 2026 14:54

s-b-e-n-s-o-n mentioned this pull request May 23, 2026

🐛 fix(agent): cross-host update notifications and toasts now fire for agent-side containers (#289) #392

Closed

Uh oh!

Conversation

s-b-e-n-s-o-n commented May 23, 2026

1. Thread the agent's container snapshot through every level of the agent-scoped operation pipeline (a7f633b7)

2. Thread the controller's operationId end-to-end so agent-hosted updates use exactly one row (d38ea2c7)

Tests / coverage

Uh oh!

vercel Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

ALARGECOMPANY left a comment

Choose a reason for hiding this comment

Uh oh!

ALARGECOMPANY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 23, 2026

Codecov Report

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

ALARGECOMPANY left a comment

Choose a reason for hiding this comment

Uh oh!

ALARGECOMPANY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

ALARGECOMPANY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

biggest-littlest left a comment

Choose a reason for hiding this comment

Uh oh!

ALARGECOMPANY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. Thread the agent's container snapshot through every level of the agent-scoped operation pipeline (`a7f633b7`)

2. Thread the controller's `operationId` end-to-end so agent-hosted updates use exactly one row (`d38ea2c7`)

vercel Bot commented May 23, 2026 •

edited

Loading