Skip to content

πŸ› fix(v1.5): rc.27 β€” agent-hosted update operationId threading + container snapshot (#289)#393

Merged
s-b-e-n-s-o-n merged 7 commits into
mainfrom
feature/v1.5-rc27
May 23, 2026
Merged

πŸ› fix(v1.5): rc.27 β€” agent-hosted update operationId threading + container snapshot (#289)#393
s-b-e-n-s-o-n merged 7 commits into
mainfrom
feature/v1.5-rc27

Conversation

@s-b-e-n-s-o-n
Copy link
Copy Markdown
Contributor

Closes the long-running same-name / cross-host update-state regression on #289. Two coordinated fixes on top of rc.26:

1. Thread the agent's container snapshot through every level of the agent-scoped operation pipeline (a7f633b7)

Cross-host updates silently dropped notification triggers (Pushover, Telegram, etc.) and produced incomplete SSE toasts for containers running on a connected agent. The controller built the agent-scoped operation row from buildAgentOperationBase without the container snapshot, so when markOperationTerminal fired emitTerminalLifecycleEvent, the resulting emitContainerUpdateApplied / emitContainerUpdateFailed carried only {operationId, containerName, containerId}. The notification trigger handler then fell back to findContainerByBusinessId, which compares the agent's bare containerName (tautulli) against the controller's fullName (mediavault_docker_tautulli) and silently dropped β€” the same class of findContainerByBusinessId miss as #385 but on the agent-scoped operation path.

Threads container through buildAgentOperationBase / ensureAgentOperationForTerminal / markAgentOperationTerminal / maybeMarkAgentOperationSucceededFromAppliedPayload / maybeMarkAgentOperationFailedFromFailedPayload, stamping agent: this.name. Existing-row race (dd:update-operation-changed before dd:update-applied) patches container onto the active row via updateOperation before the terminal emit. Adds container to MutableUpdateOperationFields so patch types accept it.

2. Thread the controller's operationId end-to-end so agent-hosted updates use exactly one row (d38ea2c7)

Even with fix #1, a follow-up rc.25 video showed a [mediavault] Container Tautulli update failed β€” Marked failed after exceeding active update TTL (1800000ms) while queued. Pushover arriving ~30 minutes after a successful update. Cause: createAcceptedContainerUpdateRequest mints a controller-side queued row with a crypto.randomUUID() operationId, but AgentTrigger.trigger(container) discarded the runtimeContext so the operationId never reached the agent. The agent minted its own UUID and fired lifecycle events keyed by that id, which the controller routed through toAgentScopedId into a third (agent-scoped) row. The original controller-side queued row was therefore never touched, sat past UPDATE_OPERATION_ACTIVE_TTL_MS, and was force-failed by the TTL sweep β€” which fired the misleading "failed" notification with the row's still-valid container snapshot.

Threads the controller operationId end-to-end:

  • AgentTrigger.trigger / triggerBatch accept and forward runtimeContext
  • AgentClient.runRemoteTrigger / runRemoteTriggerBatch extract per-container operationIds via getRequestedOperationId and include them in the agent payload ({id, name, operationId} single; {...container, operationId} per entry for batches)
  • app/api/trigger.ts runTrigger accepts an operationId in the body (validated by triggerRequestBodySchema) and threads it into requestContainerUpdate
  • app/agent/api/trigger.ts runTriggerBatch extracts per-container operationIds into {operationIds} runtimeContext before forwarding to the local trigger
  • EnqueueContainerUpdateOptions gains an operationId field honored by createAcceptedContainerUpdateRequest (single-container batches only; multi-container batches still mint per-container UUIDs)
  • New AgentClient.resolveAgentOperationId checks the controller's operation store for an existing row at the raw (unscoped) id and reuses it when found, falling back to toAgentScopedId when the agent does not echo a known controller id (backwards-compat with older agents)

The controller-side queued row therefore transitions directly to in-progress and succeeded/failed from the agent's lifecycle events, no parallel agent-scoped row is created, the TTL sweep has nothing stale to fail, and the spurious "update failed" notification disappears.

Tests / coverage

  • 9903 app tests pass with 100% coverage (statements / branches / functions / lines)
  • New tests cover: AgentTrigger runtimeContext forwarding, runRemoteTrigger/Batch operationId payload, agent runTrigger/runTriggerBatch operationId extraction, request-update operationId honoring (single-container only), and AgentClient.resolveAgentOperationId reuse-vs-scope behavior
  • Build clean, lint clean (no new biome findings)

Fixes: #289

…perations

Cross-host updates (#289) silently dropped notification triggers
(Pushover) and produced incomplete SSE toasts for containers running
on a connected agent. The controller built the agent-scoped operation
row from buildAgentOperationBase without the container snapshot, so
when markOperationTerminal fired emitTerminalLifecycleEvent, the
resulting emitContainerUpdateApplied / emitContainerUpdateFailed
carried only {operationId, containerName, containerId}. The trigger
handler then fell back to findContainerByBusinessId, which compares
the agent's bare containerName (`tautulli`) against the controller's
fullName (`mediavault_docker_tautulli`) and silently dropped.

Threads container through buildAgentOperationBase /
ensureAgentOperationForTerminal / markAgentOperationTerminal /
maybeMarkAgentOperationSucceededFromAppliedPayload /
maybeMarkAgentOperationFailedFromFailedPayload, stamping
agent: this.name. Existing-row race (dd:update-operation-changed
before dd:update-applied) patches container onto the active row via
updateOperation before the terminal emit. Adds container to
MutableUpdateOperationFields so patch types accept it.

Fixes: #289
…er Unreleased

Adds an Unreleased entry for the agent-scoped update-operation container
threading fix shipped in a7f633b so the rc.27 release notes pick it up
alongside the rc.26 batch.
…d updates use one row

Agent-hosted container updates (#289 follow-up) silently spawned a second
"update failed" Pushover ~30 min after a successful update because the
controller-side queued row was orphaned. createAcceptedContainerUpdateRequest
minted a queued row with a controller operationId, but AgentTrigger.trigger()
discarded the runtimeContext, so AgentClient.runRemoteTrigger posted {id,name}
without an operationId. The agent then minted its own UUID and fired lifecycle
events keyed by that UUID, which the controller scoped via toAgentScopedId into
a third (agent-scoped) row. The original controller-side queued row was never
touched, sat past UPDATE_OPERATION_ACTIVE_TTL_MS, and was force-failed by the
TTL sweep with "Marked failed... while queued" β€” which fired a misleading
failure notification on a successful update.

Thread the controller operationId end-to-end so one row owns the lifecycle:

- AgentTrigger.trigger / triggerBatch accept and forward runtimeContext.
- AgentClient.runRemoteTrigger / runRemoteTriggerBatch extract per-container
  operationIds via getRequestedOperationId and include them in the agent
  payload ({id,name,operationId} single; {...container,operationId} batches).
- app/api/trigger.ts runTrigger accepts operationId in the body (validated by
  triggerRequestBodySchema) and threads it into requestContainerUpdate.
- app/agent/api/trigger.ts runTriggerBatch extracts per-container operationIds
  into {operationIds} runtimeContext before forwarding to the local trigger.
- EnqueueContainerUpdateOptions gains an operationId field honored by
  createAcceptedContainerUpdateRequest (single-container batches only;
  multi-container batches still mint per-container UUIDs).
- AgentClient.resolveAgentOperationId checks the controller's operation store
  for an existing row at the raw (unscoped) id and reuses it when found,
  falling back to toAgentScopedId when the agent does not echo a known
  controller id (backwards-compat with older agents).

9903 tests pass, 100% coverage on app. Adds targeted tests for AgentTrigger
runtimeContext forwarding, runRemoteTrigger/Batch operationId payload,
agent runTrigger/runTriggerBatch operationId extraction, request-update
operationId honoring, and AgentClient.resolveAgentOperationId reuse-vs-scope
behavior.

Fixes: #289
@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
drydock-demo Ready Ready Preview, Comment May 23, 2026 2:32pm
drydock-website Ready Ready Preview, Comment May 23, 2026 2:32pm

Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving β€” closes the two #289 regressions: (1) agent container snapshot threading so notification triggers resolve agent-hosted containers; (2) controller operationId threaded end-to-end so the orphaned queued row + spurious TTL-failure notification is eliminated. 100% coverage held.

Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving β€” closes the two #289 regressions: (1) agent container snapshot threading so notification triggers resolve agent-hosted containers; (2) controller operationId threaded end-to-end so the orphaned queued row + spurious TTL-failure notification is eliminated. 100% coverage held.

Copy link
Copy Markdown
Member

@ALARGECOMPANY ALARGECOMPANY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving β€” operationId threading is clean (single source of truth for the lifecycle row), and the backwards-compat fallback in resolveAgentOperationId preserves older-agent behavior. Coverage held.

ALARGECOMPANY
ALARGECOMPANY previously approved these changes May 23, 2026
Copy link
Copy Markdown
Member

@ALARGECOMPANY ALARGECOMPANY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving β€” operationId threading is clean (single source of truth for the lifecycle row), and the backwards-compat fallback in resolveAgentOperationId preserves older-agent behavior. Coverage held.

Comment thread app/agent/api/trigger.ts Fixed
@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

βœ… All modified and coverable lines are covered by tests.

πŸ“’ Thoughts on this report? Let us know!

CodeQL flagged the new per-container operationIds map in
app/agent/api/trigger.ts:runTriggerBatch as a prototype-pollution
risk: container.id arrives in the request body and was used directly
as a property name on a plain {} (high severity, js/remote-property-injection).

Replace the map with an Object.create(null) instance and reject
container.id values that are non-string, empty, or one of __proto__ /
constructor / prototype before using them as property names. Adds two
targeted tests covering the forbidden-key and non-string-id branches.

Tests: 9905 pass, 100% coverage held.
Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after CodeQL fix (af82937) β€” null-prototype guard + forbidden-key validation on the new operationIds map closes the js/remote-property-injection finding. Added tests cover the forbidden-key and non-string-id branches. Coverage held.

Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after CodeQL fix (af82937) β€” null-prototype guard + forbidden-key validation on the new operationIds map closes the js/remote-property-injection finding. Added tests cover the forbidden-key and non-string-id branches. Coverage held.

Copy link
Copy Markdown
Member

@ALARGECOMPANY ALARGECOMPANY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after CodeQL fix β€” null-prototype + forbidden-key guard is the right pattern. Tests cover the new branches.

ALARGECOMPANY
ALARGECOMPANY previously approved these changes May 23, 2026
Copy link
Copy Markdown
Member

@ALARGECOMPANY ALARGECOMPANY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after CodeQL fix β€” null-prototype + forbidden-key guard is the right pattern. Tests cover the new branches.

Comment thread app/agent/api/trigger.ts Fixed
Replace the null-prototype object with a Map in
app/agent/api/trigger.ts:runTriggerBatch.  CodeQL's
js/remote-property-injection query treats obj[userKey] = ... as a
PropWrite sink regardless of whether obj has Object.prototype on its
chain β€” Object.create(null) does not satisfy the query.

Per CodeQL's own QHelp ('This case should be avoided whenever possible
by using the ECMAScript 2015 Map instead'), Map writes are simply not
recognized as a sink, so this is the canonical mitigation.

Consumer (app/triggers/providers/docker/update-runtime-context.ts) now
accepts Map | Record for operationIds, branches on instanceof Map for
the Map path, and uses Object.hasOwn + Reflect.get for the legacy
Record path as defense-in-depth (the legacy producer in
request-update.ts mints UUIDs internally, but the guard costs nothing
and locks the lookup to own keys).

Tests: 9907 pass, 100% coverage.  Adds explicit prototype-pollution
assertion (Object.prototype must remain unmodified after a malicious
__proto__/constructor/prototype payload).
Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after CodeQL Map fix (b76e06d). Per CodeQL's own QHelp for js/remote-property-injection, Map is the canonical mitigation β€” Map writes aren't recognized as a sink. Consumer accepts Map | Record with Object.hasOwn + Reflect.get on the Record path as defense-in-depth. Coverage 100%.

Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after CodeQL Map fix (b76e06d). Per CodeQL's own QHelp for js/remote-property-injection, Map is the canonical mitigation β€” Map writes aren't recognized as a sink. Consumer accepts Map | Record with Object.hasOwn + Reflect.get on the Record path as defense-in-depth. Coverage 100%.

ALARGECOMPANY
ALARGECOMPANY previously approved these changes May 23, 2026
Copy link
Copy Markdown
Member

@ALARGECOMPANY ALARGECOMPANY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving β€” Map is the right call per CodeQL docs. Defense-in-depth on the legacy Record path is a nice bonus.

Comment thread app/triggers/providers/docker/update-runtime-context.test.ts Fixed
)

The previous '__proto__: string' construction in the prototype-chain
test for operationIds triggered CodeQL's js/invalid-prototype-value
rule and didn't actually exercise the path it claimed to (string
values at __proto__ in an object literal are silently ignored).

Replace with a regular Object.create(prototypeWithDataProp) so the
consumer's hasOwn guard is genuinely exercised against an inherited
data property.

Coverage held at 100%.
…op Trivy banner

- Roll v1.4.0–1.4.4 into a single v1.4.x row and trim per-version
  prose; the README roadmap is a summary, CHANGELOG.md owns the detail.
- Wrap Recent Updates and Roadmap in <details> so the long lists are
  collapsed by default; centered h2 preserved inside <summary>.
- Drop the Trivy supply-chain advisory blockquote at the top β€” drydock
  was confirmed unaffected (Actions not used, binary pinned to v0.69.3,
  SHA-pinned workflows), enough time has passed, and the full advisory
  remains at /security/trivy-supply-chain-march-2026.
Copy link
Copy Markdown
Member

@biggest-littlest biggest-littlest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving after rc.27 test-fix + README polish. CodeQL js/invalid-prototype-value cleared by 6a9b9e6 (replace the __proto__: 'evil-proto' object-literal β€” which is silently dropped by the engine β€” with Object.create(prototypeWithDataProp) so the consumer's Object.hasOwn guard is genuinely exercised against an inherited data property). 6c4fce6 is docs-only README cleanup. Coverage held at 100%.

Copy link
Copy Markdown
Member

@ALARGECOMPANY ALARGECOMPANY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving β€” fix is the right shape (Object.create(prototypeWithDataProp) actually exercises the hasOwn guard against an inherited data property; the previous __proto__: 'evil-proto' literal was silently dropped). Docs-only README polish on top. 100% coverage held.

@s-b-e-n-s-o-n s-b-e-n-s-o-n merged commit 74de57c into main May 23, 2026
25 checks passed
@s-b-e-n-s-o-n s-b-e-n-s-o-n deleted the feature/v1.5-rc27 branch May 23, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression with updating container state

4 participants