Releases: CodesWhat/drydock
v1.5.0-rc.27
v1.5.0-rc.27
Full Changelog: v1.5.0-rc.26...v1.5.0-rc.27
[1.5.0-rc.27] — 2026-05-24
Fixed
-
#289 — Agent-hosted container updates no longer leave an orphaned queued operation row on the controller that the 30-minute TTL sweep force-fails into a misleading "update failed" Pushover/Telegram notification long after the update actually succeeded. A user running drydock in a controller + agent topology reported on rc.25 that an "Update All" of Tautulli on two hosts produced the success notification only for the controller-host container; the agent-host container's success notification was missing and, ~30 minutes later, a second Pushover arrived saying
[mediavault] Container Tautulli update failed — Marked failed after exceeding active update TTL (1800000ms) while queued.even though the update had in fact succeeded on the agent. Cause: when the controller queues a container update viacreateAcceptedContainerUpdateRequest(app/updates/request-update.ts) it mints a controller-sideoperationIdand inserts aqueuedrow; the dispatcher then callsentry.trigger.trigger(entry.container, { operationId }). For containers hosted on an agent the trigger isAgentTrigger, whosetrigger(container)previously accepted only the container and discarded theruntimeContext.AgentClient.runRemoteTriggerposted{id, name}to the agent without the operationId, so the agent's/api/triggers/:type/:nameendpoint calledrequestContainerUpdatewith no operationId and minted its own row; the agent'sdd:update-applied/dd:update-operation-changedevents then arrived back at the controller carrying the agent-side id, which the controller routed throughtoAgentScopedIdinto a third, agent-scoped row (agent-<name>-<remote-id>). The original controller-side queued row was therefore never touched, sat queued past theUPDATE_OPERATION_ACTIVE_TTL_MSdeadline inapp/store/update-operation.ts:295-300, and was force-failed by the TTL sweep — which fired the misleading "failed" notification with the row's still-valid container snapshot (hence the correct[mediavault]agent prefix). The fix threads the controller'soperationIdend-to-end so a single row is the source of truth for the whole lifecycle:AgentTrigger.trigger/triggerBatchnow accept and forwardruntimeContext;AgentClient.runRemoteTrigger/runRemoteTriggerBatchextract per-container operationIds via the existinggetRequestedOperationIdhelper and include them in the agent payload ({id, name, operationId}for single triggers;{...container, operationId}per entry for batches); the agent-side controllerrunTriggeraccepts anoperationIdin the request body (validated bytriggerRequestBodySchema) and threads it intorequestContainerUpdate; the agent-side batch endpoint extracts per-container operationIds into an{operationIds}runtimeContext before forwarding to the local trigger;EnqueueContainerUpdateOptionsgains anoperationIdfield honored bycreateAcceptedContainerUpdateRequest(single-container batches only; multi-container batches still mint per-container UUIDs); and a newAgentClient.resolveAgentOperationIdhelper checks the controller's operation store for an existing row at the raw (unscoped) id and reuses it when found — falling back to thetoAgentScopedIdform only when the agent does not echo a known controller id, preserving backwards compatibility with older agents. The controller-side queued row therefore transitions directly toin-progressandsucceeded/failedfrom the agent's lifecycle events, no parallel agent-scoped row is created, the TTL sweep has nothing stale to fail, and the spurious "update failed" notification disappears. -
#289 — Update-applied and update-failed notification triggers (Pushover, Telegram, etc.) and UI success toasts no longer silently drop for containers running on a connected agent. A user running drydock in a controller + agent topology reported on rc.25 that an "Update All" across two hosts produced the success toast and Pushover notification only for the container on the controller host, never for the same-name container on the agent host. Cause: when the agent finishes an update it sends a
dd:update-appliedSSE payload to the controller carrying a fullcontainersnapshot. The controller'sAgentClient.handleEventroutes this throughmaybeMarkAgentOperationSucceededFromAppliedPayload→markAgentOperationTerminal→ensureAgentOperationForTerminal→updateOperationStore.insertOperation+markOperationTerminal, butbuildAgentOperationBaseinapp/agent/AgentClient.tsconstructed the inserted row from{id, kind, containerName, containerId, newContainerId}only — the container snapshot was dropped on the floor. WhenmarkOperationTerminalthen firedemitTerminalLifecycleEvent(app/store/update-operation.ts), the resultingemitContainerUpdateApplied/emitContainerUpdateFailedpayload built bybuildTerminalLifecycleEventBaselackedcontainer. The notification handlerhandleContainerUpdateAppliedEvent(app/triggers/providers/Trigger.ts) then fell back tofindContainerByBusinessId(containerName), which compares the agent's barecontainerName(e.g.tautulli) against the controller-sidefullName(e.g.mediavault_docker_tautulli) and silently dropped — the same class offindContainerByBusinessIdmiss as #385 but on the agent-scoped operation path that #385 did not cover. The fix threads the agent's container snapshot through every level of the agent-scoped operation pipeline —buildAgentOperationBase,ensureAgentOperationForTerminal,markAgentOperationTerminal,maybeMarkAgentOperationSucceededFromAppliedPayload, andmaybeMarkAgentOperationFailedFromFailedPayload— stampingagent: this.nameso the controller's view of the container is consistent. Thedd:update-operation-changed-before-dd:update-appliedrace is handled by patching the container snapshot onto the existing active row viaupdateOperationbefore the terminal emit runs (only when the existing row lacks a container, never overwriting an existing snapshot).containeris added toMutableUpdateOperationFieldsinapp/store/update-operation.tsso terminal and active patches accept it. The store's terminal-lifecycle emit therefore naturally carries the agent's container intoemitContainerUpdateApplied/emitContainerUpdateFailed, thepayloadContainershortcut in the trigger handler succeeds, and both the notification trigger and the SSE toast fire end-to-end on the controller for agent-originated updates.
v1.5.0-rc.26
v1.5.0-rc.26
Full Changelog: v1.5.0-rc.25...v1.5.0-rc.26
[1.5.0-rc.26] — 2026-05-22
Fixed
-
Image reference construction — unanchored
/v2strip could silently corrupt references when the image name contained a/v2path segment.Registry.getImageFullNameand the controller-mode fallback inresolveContainerImageFullNameboth applied.replace(/\/v2/, '')to the fully concatenatedregistryUrl/imageName:tagstring. Because the regex was unanchored and non-global, if the image name contained a/v2segment (e.g.library/v2/tool) the strip would remove it from the image name rather than the registry URL — producing a silently wrong reference handed to Trivy. The fix extracts a shared pure helperbuildImageReference(app/registries/image-reference.ts) that cleans the registry URL before concatenation using anchored regexes (^https?:\/\/and/v2\/?$) so the URL scheme and trailing/v2API path are removed without touching anything in the image name. BothRegistry.getImageFullNameand the fallback branch ofresolveContainerImageFullNamenow delegate to this helper, eliminating the duplicate logic. -
#386 — Agents intermittently showing 0 running containers in the controller UI — a second recurrence the rc.25 fix did not close. The rc.25 fix suppressed the authoritative watcher snapshot whenever container enumeration failed or per-container enrichment errors dropped containers, but the recurrence reported on rc.25 is a different failure mode: a cold-start race between the controller's handshake and the agent's first watch cycle. When the controller's
AgentClient(re)connects to an agent's SSE stream it handshakes immediately viaGET /api/containers; if the agent'swatchatstartcron has not yet finished its first run, the agent's in-memory store is still empty and the handshake legitimately receives 0 containers (the agent log showsHandshake successful. Received 0 containers.~5 s beforeCron finished (4 containers watched, 0 errors)). The handshake then firesemitAgentConnected, the UI re-fetches/api/v1/agents, and the agent's running-container count renders 0. When the agent's cron completes moments later it pushes add:watcher-snapshot, andAgentClient.handleWatcherSnapshotEventingests the four containers into the controller store correctly — but nothing told the UI to refresh, becauseAgentsViewonly re-fetches the agent summary onagent-status-changed/connected/resync-requiredevents, not oncontainer-added/container-updated. The stale 0 therefore persisted until an unrelated reconnect event (such as an agent restart) fired. The fix adds a dedicatedAgentStatsChangedevent:app/event/index.tsgainsemitAgentStatsChanged/registerAgentStatsChanged(mirroring the existingAgentConnectedpair);AgentClient.handleWatcherSnapshotEventnow emitsemitAgentStatsChanged({ agentName })after every completed watcher snapshot;app/api/sse.tsbroadcasts it to UI SSE clients asdd:agent-stats-changed; andui/src/stores/eventStream.tsmaps that to the existingagent-status-changedbus event. A completed agent watch cycle therefore always refreshes the controller's agent-summary count, even when the handshake raced ahead of the agent's first cron. -
#342 — A container is no longer shown as "update available" with a blank target version after a transient registry error.
hasRawUpdateinapp/model/container.tscomparedtransformTag(image.tag.value)againsttransformTag(result.tag)without guarding an undefinedresult.tag. When a registry scan failed mid-flight (for example a Docker Hub or GHCR429) and left a containerresultpresent but itstagunset,transformTag(undefined)returnedundefined, thelocalTag !== remoteTagcomparison evaluated true, and the container was flaggedupdateAvailablewith anunknownupdate kind — which the UI renders as an update with no target version (the reporter saw this onimmich_redis).hasRawUpdatenow performs the tag comparison only when bothimage.tag.valueandresult.tagare defined, matching the existing guard ingetRawTagUpdate. Digest-only updates are unaffected: a container with an undefinedresult.tagbut a genuine digest change still reports the digest update. -
#386 follow-through — the controller's agent-summary container count now also refreshes on docker-event-driven container changes, not only completed cron cycles. The initial #386 fix emitted
emitAgentStatsChangedfromAgentClient.handleWatcherSnapshotEvent, the cron-watch path. An agent also ingests individual container add/remove/update events from the Docker event stream between cron cycles (handleContainerChangeEvent,handleContainerRemovedEvent), via the controller-initiatedwatch()path, and via the per-container controller-initiatedwatchContainer()path — none of which emitted the stats-changed signal, so a container started or stopped on an agent host could leave theAgentsViewrunning-container count stale until the next 6-hourly cron. All four paths now emitemitAgentStatsChangedafter mutating the controller store, keeping the count current in real time. -
#342 — GitHub release-notes lookups now survive GitHub's secondary rate limit instead of giving up on the first burst. Drydock authenticates its api.github.com release-notes requests by reusing the configured GHCR token, but a watch cycle still fans out a lookup for every watched container at once and trips GitHub's secondary rate limit — a
403GitHub returns to authenticated callers who burst too many requests. The shared retry helper (app/registries/http-retry.ts) only retried429/503, so the secondary-limit403was never retried: the provider loggedGitHub release notes lookup is rate-limitedand returned nothing.withRetrygains two optional, opt-in hooks —retryPredicate(retry a status outsideretryableStatuses) andretryDelayMs(per-attempt delay override) — leaving every existing caller unchanged.GithubProviderclassifies a403as a secondary rate limit only when it carries aretry-afterheader orx-ratelimit-remaining: 0, retries those (honouringretry-after/x-ratelimit-resetfor the delay), and leaves a genuine403authorization failure failing fast as before. Once retries are exhausted the provider arms a short module-level cooldown — driven by GitHub's own retry hint, floored at the 60 s default so aretry-after: 0hint cannot produce an already-expired cooldown — during which further release-notes lookups are skipped, so a single cron cycle no longer hammers an already-tripped limit container after container. The rate-limit warning now also records whether the request was authenticated. -
#342 — the registry-error tooltip on the Containers view now names the registry that failed. When a registry tag lookup errors (for example a
429rate limit) the container shows a registry-error badge whose tooltip previously rendered only the raw transport message —Registry error: Request failed with status code 429— with no indication of which registry was queried.registryErrorTooltipinui/src/views/ContainersView.vuenow derives the registry hostname from the container'sregistryUrland renders it through a newregistryError.detailWithRegistryi18n string ({registryHost} — {error}), e.g.ghcr.io — Request failed with status code 429. Containers whoseregistryUrlis absent or unparseable fall back to the original message unchanged.
v1.5.0-rc.25
v1.5.0-rc.25
[1.5.0-rc.25] — 2026-05-21
Fixed
-
#371 — Containers "Group By Stack" view no longer dissolves a multi-container stack into "Ungrouped" while its last container is mid-update. The flatten rule in
groupedContainers(ui/src/views/ContainersView.vue) previously keyed off the transient live container count (buckets[key].length === 1). During a docker recreate a 2-container stack momentarily shows only 1 live container (old removed, new not yet added), so the rule fired and dropped the stack header. The fix adds agroupAssignedSizeMapref (populated byloadGroups()from the groups API response and reset to{}on error) that records each group's API-assigned member count. The flatten condition is nowbuckets[key].length === 1 && groupAssignedSizeMap.value[key] === 1— a strict equality check so stacks whose assigned size is > 1 or transiently absent from the API response are never flattened mid-update. Genuine single-container stacks (assigned size exactly 1) are still flattened as before (GitHub Discussion #179). -
#386 — Agents intermittently showing 0 running containers in the controller UI — a recurrence of #362 that the rc.20 fix did not fully close. The rc.20 guard introduced a
containerEnumerationFailedflag inDocker.watch()(app/watchers/providers/docker/Docker.ts) that suppresses the authoritativeemitWatcherSnapshotwhengetContainers()itself throws. However,getContainers()does not throw on per-container enrichment failures:addImageDetailsToContainer()is called for each watched container, and any container whose enrichment throws is caught (.catch(error => return error)) and then silently filtered out by.filter(result => !(result instanceof Error) && result != null). A transient docker / socket-proxy hiccup during image inspect can therefore causegetContainers()to return a short or empty array without throwing — thecontainerEnumerationFailedguard does not fire,watch()emits an authoritativeemitWatcherSnapshotwith the degraded container list, and the controller'sAgentClient.handleWatcherSnapshotEventprunes every container not in that list, wiping the agent's view. The agent's own store is preserved because its local prune re-confirms each container viainspect(), which is why the agent kept reporting its containers and a restart's handshake re-synced the controller. The fix extends the snapshot-suppression in two steps:getContainers()now accepts an optionaldiagnosticsout-parameter and writes the number of containers dropped due to enrichment errors intodiagnostics.enrichmentErrors;watch()creates and passes this object on every call, logs aContainer enumeration degradedwarning when the count is non-zero, and suppressesemitWatcherSnapshotwhenever eithercontainerEnumerationFailedis true orenumerationDiagnostics.enrichmentErrors > 0. Per-container reports still emit as before; only the authoritative controller-side prune is deferred until a fully clean watch cycle. -
#385 — Telegram, Pushover, and other notification triggers no longer silently swallow
update-appliedandupdate-failedevents after a compose recreate or on multi-agent deployments. When an update routed through the operation queue completed, the terminal lifecycle event (update-appliedon success,update-failedon failure/rolled-back) was emitted fromapp/store/update-operation.ts:buildTerminalLifecycleEventBasewith onlycontainerName/containerId/operationIdon the payload — nocontainerobject. Notification handlers inapp/triggers/providers/Trigger.tsfell back tofindContainerByBusinessId(containerName), which missed during the ~8 s window between the old container being removed and the new one being re-watched after a compose recreate; the handler then dropped the event with aNo container found for update-applied event => ignoredebug log. This was the same class of race as #355 but for the operation-queue-driven path that bypassesUpdateLifecycleExecutor's direct emit. The fix persists a snapshot of theContaineron the operation entry at enqueue time (app/updates/request-update.ts:createAcceptedContainerUpdateRequest) andbuildTerminalLifecycleEventBasenow forwards that snapshot on the terminal-lifecycle payload — bothupdate-appliedandupdate-failed, closing the race for compose successes and failures alike. The agent SSE wire was also extended to forward the container snapshot end-to-end so multi-agent deployments get the same fix:sanitizeUpdateAppliedPayloadForAgentSseandsanitizeUpdateFailedPayloadForAgentSseinapp/agent/api/event.tsincludecontainerwhen present (previously stripped to scalars only), and the controller'sAgentClient.parseUpdateFailedEventPayloadaccepts and decorates an inbound container with the sourceagentname to mirror the existing applied-path behaviour. The snapshot is internal-only: a newtoApiUpdateOperationhelper inapp/store/update-operation.tsstrips it before serialising operations throughGET /api/v1/update-operations/:id,GET /api/containers/:id/update-operations, andPOST /api/operations/:id/cancel, so container labels anddetails.envare not exposed to API consumers.
v1.5.0-rc.24
v1.5.0-rc.24
[1.5.0-rc.24] — 2026-05-17
Changed
- Translations refreshed from Crowdin (commit
202f3d83). Human translations were synced from Crowdin for the ~110-string rc.23 i18n extraction sweep, updating the 16 non-English locales across theappShell,containerComponents,listViews,sharedComponents,configView,agentsView, andnotificationOutboxViewnamespaces. Strings that were previously falling back to English now render in each locale.
Fixed
-
#370 — Containers list "Version" column again shows the human-readable image tag for floating-tag + digest-watch containers, restoring the #356 fix that rc.20 inadvertently reverted. The rc.20
#342follow-up (commitb40d3db8) added a visiblesha256:… → sha256:…digest pair to the Containers table "Version" cell and card body for allupdateKind === 'digest'containers that are not digest-pinned. The intent was to surface the digest transition for hybrid containers where both the tag and the underlying image layer changed simultaneously; however, the change cast too wide a net: it also applied to floating-tag + digest-watch containers (e.g.prom/prometheus:latest,linuxserver/plexwith a transform tag) — exactly the rows that #356 fixed to show the human-readable tag instead of raw digest strings. TheupdateKind === 'digest' && !isDigestPinnedbranch of the table version cell and card body inui/src/components/containers/ContainersGroupedViews.vuehas been restored to the rc.19 behaviour: the version cell rendersc.currentTagas aCopyableTag(with the full digest delta in the cell tooltip), and the card body shows only the update-state badge (with the digest delta in the badge tooltip). The digest transition remains visible through the adjacent "kind" column update-state indicator and the container detail panels. Digest-pinned containers (whereisDigestPinnedis true) are unaffected and continue to show thesha256:… → sha256:…pair directly in the cell. -
#374 — Security scans no longer hand Trivy a raw registry v2 API URL, which had caused every scan in controller mode to fail.
resolveContainerImageFullName(app/api/container/shared.ts), used by both the security scan scheduler and the container API, falls back to composing the image reference directly fromcontainer.image.registry.urlwhenever the container's registry component is not present in the controller's registry state — the normal situation in controller mode (DD_LOCAL_WATCHER=false), where registries are configured on the agents rather than on the controller.registry.urlis stored in the registry v2 API base form (e.g.https://registry-1.docker.io/v2), so the fallback produced references such ashttps://registry-1.docker.io/v2/dgtlmoon/sockpuppetbrowser:0.0.3; Trivy then interpreted the scheme as a hostname and every scan failed withdial tcp: lookup https. The fallback now mirrorsRegistry.getImageFullName: it strips the URL scheme and the/v2path segment and uses an@separator for digest references, yielding a plainregistry-1.docker.io/dgtlmoon/sockpuppetbrowser:0.0.3reference. Containers whose registry component is available are unaffected — they already resolved through the correctgetImageFullNamepath.
v1.5.0-rc.23
v1.5.0-rc.23
[1.5.0-rc.23] — 2026-05-16
Added
-
Self-update now works when Drydock reaches the Docker daemon over a TCP host, not only through a bind-mounted
/var/run/docker.sock(commitfc34ffb9). The self-update helper container — the short-lived container that outlives Drydock to stop the old instance, health-check the replacement, and commit or roll back — was hardcoded to a bind-mounted Unix socket and aborted withSelf-update requires the Docker socket to be bind-mountedwhenever Drydock's watcher was configured with a TCPhost. That is the normal setup when a Docker socket proxy (such as sockguard ordocker-socket-proxy) mediates daemon access, so self-update was unavailable for those deployments even though every other container updated correctly.resolveHelperDockerConnectionnow inspects the watcher's Dockerode connection: a TCP host produces a TCP helper that is attached to Drydock's own Docker network (the container'sNetworkModeis cloned so the helper can resolve the proxy by DNS) and receivesDD_SELF_UPDATE_DOCKER_HOST/DD_SELF_UPDATE_DOCKER_PORT/DD_SELF_UPDATE_DOCKER_PROTOCOLinstead of a socket bind mount;runSelfUpdateControllerbuilds a TCP Dockerode client from those variables and skips the socket-only API-version probe and redirect guard. The bind-mounted-socket path is unchanged. When self-update runs through a filtering socket proxy the Drydock container must carry the proxy's ownership label so the helper is permitted to stop and replace it — seecontent/docs/current/configuration/self-update/index.mdx. -
The per-container Update button is locked with a
Self-update unavailableindicator when Drydock cannot update itself in the current deployment (commitcf777280). A new hardself-update-unavailableupdate-eligibility blocker is raised for the Drydock self-container when an update is available but self-update can run neither over a bind-mounted socket nor a TCP host — i.e. the watcher uses a Unix socket and/var/run/docker.sockis not present in the container. The blocker locks the per-row Update button with an explanatory tooltip and makesPOST /containers/:id/updatereturn409, instead of the previous behaviour where the button appeared actionable and the update failed mid-flight with a socket error. Deployments that reach Docker over TCP report self-update as available, so the button is unaffected there. The check fails open: when the watcher cannot be resolved the blocker is not raised. -
i18n coverage extended to the remaining hardcoded UI strings across 28 components (discussion #329, commit
1b65e591). A full audit of all 82 UI components found approximately 110 English strings that bypassedvue-i18nand rendered raw regardless of the active locale. The extraction sweep covers:AppLayoutsearch scopes, group labels, section subtitles, and the five deprecation banner bodies (converted to<i18n-t>so embedded<code>elements stay translatable);ThemeTogglevariant names;DataFilterBarview-mode names;DetailPanelsize labels (S / M / L); update-kind labels (Major/Minor/Patch/Digest) inContainerFullPageDetailandContainerFullPageTabContent, which are now reactivecomputedmaps so locale switches take effect without a page reload; tail/status labels and the stdout/stderr stream-type labels inContainerLogs; action tooltips and button labels inContainersGroupedViews; error messages and empty-state fallback labels across the Agents, Config, Registries, Triggers, Watchers, Notifications, NotificationOutbox, and Security views; and theWATCHINGwatcher-status badge, which was rendering the raw backend enum string. New keys land inen/appShell.json,en/containerComponents.json,en/listViews.json,en/sharedComponents.json,en/configView.json,en/agentsView.json, anden/notificationOutboxView.json. Non-translatable identifiers — the product name "Drydock" and format strings such asspdx-json/cyclonedx-json— are intentionally left as literals. Other locales pick up the new keys via theenfallback immediately and will receive human translations on the next Crowdin sync.
Changed
- Self-update helper now prefers the bind-mounted Docker socket over a TCP watcher connection (commit
aa828d88). The previousresolveHelperDockerConnectionlogic checked the watcher's TCP modem first, meaning that any deployment where Drydock was configured with a TCP host (e.g. routing through a socket proxy) would always route the helper through that proxy — even when the target container itself had/var/run/docker.sockbind-mounted. For infrastructure updates (dd.update.mode=infrastructure), where the container being replaced is the socket proxy, this is fatal: the helper relies on the proxy being up, but the update stops it. The resolution order is now inverted:findDockerSocketBindruns first, and if the target container carries a socket bind the helper uses that direct socket path regardless of the watcher's TCP configuration. The TCP path is preserved as the fallback for pure socket-less deployments where Drydock reaches Docker exclusively over a remote host.
Fixed
-
Dashboard Host Status widget no longer auto-scrolls to the last host when the host list changes (commit
cbe815a6). The full-mode host list usedscroll-snap-type: y mandatorywith a measured tail spacer. Whenever the host-row set changed — a watcher or agent added, removed, or renamed, or a full-to-compact mode transition — Chromium re-snapped to the last row's snap point, leaving only the final host visible above a large empty gap. The scroll-snap classes (snap-y,snap-mandatory,snap-start), the dynamic tail-spacer element, and the measurement machinery behind it (theonUpdatedhook,requestAnimationFramescheduler, andResizeObserver-triggered recompute) have all been removed. The content-aware full/compact sizing that keeps whole rows visible was already sufficient; the snapping added no functional value and actively fought the layout on every data change. -
Dashboard Resource Usage widget minimum height raised so per-container CPU and Memory lists stay visible (commit
59719757). Theresource-usagewidget'sminHwas set to 3 grid units (approximately 122 px), which falls below the 180 px threshold at which the per-container lists collapse out of view. The minimum is now 7 grid units (approximately 306 px).applyConstraintsclamps any saved layout item that is below the new minimum on load, so existing dashboard configurations with a shrunken resource-usage widget are silently corrected on the next render rather than persisting an unusable layout. -
AgentClienttimers are now cleared when an agent is removed, preventing orphaned timeouts (commit03bf7211).AgentClientmaintains twosetTimeouthandles —stableConnectionTimer(arms 30 s after the SSE response arrives to reset the backoff counter) andreconnectTimer(fires the next reconnect attempt after the exponential-backoff delay). Neither was cancelled whenremoveAgentspliced the client out of the manager's list. An agent removed mid-reconnect-cycle or mid-stability-window would keep an armed timer alive indefinitely, potentially triggering astartSsecall against a client that was no longer tracked and leaking the associated resources. A new idempotentstop()method onAgentClientcancels both timers and nulls the handles;removeAgentnow callsstop()on each matching client before splicing it.
Security
-
TCP Docker host is validated before the self-update controller passes it to Dockerode (commit
441b4358).DD_SELF_UPDATE_DOCKER_HOSTwas forwarded to Dockerode without sanitization. A newvalidateTcpDockerHostfunction rejects values that contain a URL scheme prefix (tcp://,http://,https://, or any<scheme>://form), a userinfo segment (@), whitespace, or path separators (/or\), throwing a descriptive error before any network connection is attempted. This prevents an environment variable or compose-file value from inadvertently injecting a path or URL component that Dockerode would interpret in an unexpected way. The validated host and resolved port are also logged atINFOlevel so the connection target is auditable in the container logs.runSelfUpdateControllerwas additionally refactored to remove a control-flow asymmetry: socket and TCP paths previously diverged into separate Dockerode-construct-and-run blocks; they now share a single tail (disableSocketRedirectsremains socket-only). -
OIDC error logs now redact RFC-1918 IP addresses and absolute filesystem paths (commit
9b79de77). The rc.22getErrorChainMessageimprovement walkserror.causechains up to depth 5 and appends the results to OIDC warn logs, which is the right diagnostic behaviour — but TLS and connection errors in Node/undici frequently include private network addresses (e.g.connect ECONNREFUSED 10.0.0.5:2376) and absolute filesystem paths (e.g.error loading /etc/ssl/certs/ca-bundle.pem) that should not appear in logs shipped to centralised observability systems.sanitizeOidcErrorMessagenow applies two additional redaction passes after the existing URL and bearer-token passes: RFC-1918 IPv4 ranges (10.x, 172.16–31.x, 192.168.x) with an optional port are replaced with[internal-addr]; absolut...
v1.5.0-rc.22
v1.5.0-rc.22
[1.5.0-rc.22] — 2026-05-15
Added
- All 16 non-English locales now have full key parity with the English source (commits
5e463631,012dcb83). Two complementary passes bring every translation up to date. The first pass (5e463631) filled gaps in the ten locales that were already mostly translated (de, es, fr, it, nl, pl, pt-BR, tr, zh-CN, zh-TW) — each was missingnotificationOutboxView.jsonentirely and had drifted behind recent string extractions inlistViews.json,containerComponents.json,containersView.json, anddashboardView.json(new keys:digestLabel,blockedTagvariants,manualUpdateOnlyvariants,narrowViewportSuffix,autoHiddenBadgeTooltip, queued-update toast variants,recentUpdates.widgetAria). A JSON-breaking typo inde/dashboardView.json(straight quote instead of closing curly quote) was also corrected. The second pass (012dcb83) gave the six stub locales (ar, ja, ko, ru, uk, vi) — which had been scaffolded with English placeholders since rc.20 — a full translation pass across all 13 namespace files plus the newnotificationOutboxView.json. Brand names, acronyms, and interpolation placeholders are preserved verbatim; DevOps terminology follows each language's established conventions.
Changed
- Playwright E2E tests moved to a dedicated workflow file (
e2e-playwright.yml) (commitf0989301). OSSF Scorecard's CI-Tests check scores from the github-actions Check Suite conclusion, not individual check-run conclusions. Because every job inci-verify.ymlrolled into a single suite, one failing Playwright assertion would flip the entire suite to failure and cause Scorecard to mark merged PRs as untested — even when all other jobs were green (manifesting as code-scanning alert #43, CI-Tests score 9/10). Each workflow file gets its own Check Suite per commit; isolating Playwright intoe2e-playwright.ymlmeans a Playwright failure no longer drags theci-verifysuite down for Scorecard's purposes. Branch protection continues to gate on the "🎭 E2E: Playwright" status check (matched by job name, not workflow file), andrelease-cut.ymlnow polls both workflows on the target SHA so releases still require Playwright success.
Fixed
-
#368 — OIDC custom-dispatcher paths (cafile /
DD_AUTH_OIDC_*_INSECURE=true) no longer fail with an opaqueTypeError: fetch failedon Node 24. Node 24 ships built-in undici 7.21.0 (v1 dispatcher interface) while the app's userlandundici@8(bumped in rc.20) exposes anAgentwith the v2 dispatcher interface. The OIDC custom fetch was constructing the v2Agentfrom userland undici and passing it asdispatcherto Node's globalfetch, which is bound to the built-in undici 7. The v2Agent's handlers don't satisfy the v1 contract, so the request silently fails — the surface symptom reported by a user upgrading rc.19 → rc.21 against self-signed Authentik withDD_AUTH_OIDC_AUTHENTIK_INSECURE=true. The undici project'sDispatcher1Wrapperbridge (nodejs/undici#4827) covers this mismatch on Node 22 but is absent on Node 24. The fix importsfetchfromundiciand uses it whenever a custom dispatcher is required (cafile or insecure path) so both halves share the same dispatcher version. The non-insecure code path is unchanged — openid-client continues to use its default fetch when no custom dispatcher is needed. A strict-tsctype error introduced in the same fix (undici's nominalRequestInfo/Responsetypes differ from thelib.domtypes thatopenid-client'sCustomFetchis typed against) was resolved by casting throughunknownat the boundary usingParameters<typeof undiciFetch>andReturnType<openidClientLibrary.CustomFetch>; there is no runtime behavior change. -
OIDC warn logs now surface the full
error.causechain, making TLS and DNS failures actionable (commit720d99a3).undici's fetch surfaces failures as a genericTypeError: fetch failed; the actionable diagnostic (ENOTFOUND,ECONNREFUSED,UNABLE_TO_VERIFY_LEAF_SIGNATURE, etc.) lives onerror.cause, sometimes nested. The previous error sanitizer logged only the top-level message, so issue #368 reached us with only"Unable to initialize OIDC session (fetch failed)"— no indication whether DNS, TLS, or routing was at fault. A newgetErrorChainMessagehelper walkserror.causeup to depth 5, joining parts with←and appending[code]when acodeproperty is present; aWeakSetguards against cyclic cause chains.sanitizeOidcErrorMessagenow uses it so all OIDC warn logs include the cause chain (still passed through the existing URL and token redaction). This is a forward-only diagnostic improvement with no runtime behavior change for healthy OIDC paths. -
#362 — SSE reconnect exponential backoff no longer collapses to a flat 1 s loop when the agent is struggling.
AgentClient.startSse()previously calledthis.reconnectAttempts = 0the instant the axios response headers arrived — before the stream had proven it could stay open. A crash-looping agent, a reverse-proxy with a short upstream idle timeout, or any situation where the SSE stream returned HTTP 200 and then ended almost immediately would cycle as: connect → 200 →reconnectAttempts = 0→ stream ends →scheduleReconnect()(delay = 1 000 ms, attempts → 1) → 1 s later connect → 200 →reconnectAttempts = 0again — and so on forever. The user who filed #362 sawSSE stream ended. Reconnecting...in their controller logs every ~1.00 s indefinitely, with no escalation. The backoff now only resets after the stream has stayed open forSSE_STABLE_CONNECTION_MS(30 s). AsetTimeoutis armed when the response arrives and cancelled byscheduleReconnect()if the stream ends or errors before the window expires; streams that end early therefore keep their accumulatedreconnectAttemptsand the delay continues to double up to the 60 s cap as intended.
v1.5.0-rc.21
v1.5.0-rc.21
[1.5.0-rc.21] — 2026-05-15
Added
-
i18n coverage extended to the notification outbox, notification rules, and registry/server status badges (discussion #329). Four UI surfaces still rendered hardcoded English even under the zh-CN locale:
NotificationOutboxView(tab labels, table headers, action buttons, toast messages), the notification rule name/description column inNotificationsView, and the connection-status badges inRegistriesViewandServersView. All four are now extracted into the existingt()catalogs (newnotificationOutboxViewnamespace; new entries undernotificationsView.rules.*,registriesView.status.*, andserversView.status.*). Rule names and statuses usete()so backend-supplied custom names fall back to the raw string when the catalog has no entry. Translation files for other locales will pick up the new keys on the next Crowdin sync. -
DD_AGENT_ALLOW_INSECURE_SECRETescape hatch for closed-LAN deployments. rc.20 tightened the agent-secret-over-HTTP check from a warning to a hard error inapp/agent/AgentClient.ts. rc.21 introducesDD_AGENT_ALLOW_INSECURE_SECRET=trueas an explicit controller-side opt-in for environments (isolated private LANs, air-gapped setups) where the operator accepts that the agent secret travels in cleartext. Default behavior is unchanged — without the flag the boot-time error is still thrown. When the flag is set to exactlytrue, the error is downgraded to alog.warnon every startup so the security signal is preserved and visible in logs. Any other value (e.g.1,yes,TRUE) continues to throw. Seecontent/docs/current/configuration/agents/index.mdxfor guidance on recommended alternatives (certfile/cafile, reverse proxy TLS termination).
Changed
- Default watcher cron relaxed from hourly to every 6 hours (#342 follow-up).
app/watchers/providers/docker/Docker.tsnow defaultscronto0 */6 * * *(every 6 hours) instead of0 * * * *(hourly). Hourly polling was the most aggressive default among active 2026 update managers — Diun ships0 */6 * * *, Watchtower (archived Dec 2025) defaulted to 24 h, and our upstream WUD ships no default at all. With fleets of 20+ containers and image tag lists that paginate to thousands of entries (immich-server is 24+ GHCR pages), hourly polling saturates anonymous Docker Hub limits (100 pulls / 6 h) and trips GitHub's 5 k req/h release-notes ceiling. rc.20's per-host token bucket +Retry-Afterhandling stays as the safety net; the default change addresses the root cause. Users who setDD_WATCHER_{name}_CRONexplicitly are unaffected. Users who want near-real-time detection (security patches) can still setDD_WATCHER_{name}_CRON=0 * * * *. Docs (content/docs/current/configuration/watchers/index.mdx,content/docs/current/api/watcher.mdx,content/docs/current/api/agent.mdx) updated to reflect the new default.
Fixed
-
discussion #295 — Release-notes icon in the container table now always opens the same popover, even when only an external release URL is available. Previously the file-text icon in the actions column had two different behaviors depending on what release-notes metadata we'd fetched for the container: containers with structured notes (title + body, e.g. from the GitHub release-notes provider) got an icon button that opened a popover with an expandable preview; containers with only a bare
releaseLinkURL got an icon that was a direct external<a>— no popover. The popover shell now renders uniformly for both cases. When onlyreleaseLinkis available, the popover contains a single row that links out to the external URL (with anexternal-linkindicator instead of the chevron used by expandable rows), and clicking the row dismisses the popover before navigating. No change for containers that already have structured notes — the existing popover and inline-expander behavior are unchanged. -
Docker multi-arch build no longer fails when Alpine repos drift between archs.
Dockerfilepinnedcurl=8.17.0-r1, which broke thelinux/arm64build leg after Alpine'slatest-stable/aarch64mirror rotated tocurl-8.19.0-r0whilelatest-stable/x86_64was still on8.17.0-r1. No single pin can satisfy both archs during a mirror rotation window; the curl entry is now unpinned so apk installs whatever's current per-arch. Other pinned packages still match across both archs and stay pinned. -
#362 —
DD_SESSION_SECRETno longer crashes startup when unset; secret is auto-generated and persisted to the store on first boot. rc.20 madeDD_SESSION_SECRETa hard requirement (commitb9e8be38) to close a real issue — the prior fallback generated a fresh per-process random secret on every restart, which silently invalidated every active session whenever drydock restarted. But the hard-require shipped without a migration path: existing deployments that didn't set the variable hit an immediate boot crash on upgrade. The fallback is now restored as a persisted secret: on first boot withoutDD_SESSION_SECRETset, drydock generates 64 random bytes (randomBytes(64).toString('hex')) and writes them to a newsecretscollection inside/store/dd.json. Subsequent boots read the persisted value, so sessions survive restarts. The env var still takes precedence when set (and whitespace-only values are treated as unset). Operators upgrading from rc.20 with noDD_SESSION_SECRETconfigured will boot cleanly; deployments that already set the variable see no change.
v1.5.0-rc.20
v1.5.0-rc.20
[1.5.0-rc.20] — 2026-05-14
Added
-
Fleet-aggregate stats subsystem (commits
feature/v1.5-rc17). NewContainerStatsAggregatorpolls each locally-monitored container once per tick (default 10 s) and computes a fleet-wideContainerStatsSummary(total CPU%, total memory, top-N rows). Two new endpoints —GET /api/v1/stats/summaryandGET /api/v1/stats/summary/stream— expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the?touch=falseworkaround) where the widget showed zeros because the per-container cache was never warmed. The legacyGET /api/v1/containers/statsendpoint and the client-sidesummarizeContainerResourceUsagerollup have been removed. -
Per-container update locks (commit
761fb834). New keyedLockManagerprimitive inapp/updates/lock-primitives.tsreplaces the module-levelpLimit(1)that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project forDockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a newgetUpdateLockKeys(container)method. -
Restart recovery for queued and pulling updates (commit
00788b13). Startup reconciliation inapp/store/update-operation.tsis now selective:status=queuedoperations stay queued for the recovery dispatcher to pick up, andphase=pullingrows are reset toqueued(pull is idempotent). All other in-progress phases —prepare,renamed,new-created,old-stopped,new-started,health-gate,rollback-*— remain marked failed because they leave inconsistent state that an operator should review. A newapp/updates/recovery.tsmodule runs once afterregistry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatorylastErrorso they don't sit in the queue forever. -
Notification outbox with retry and dead-letter queue (commits
a9561d93,7d2ef6eb,b215d295,ce26bece). NewnotificationOutboxLokiJS collection (app/store/notification-outbox.ts) and matchingapp/notifications/outbox-worker.tsbackground worker provide durable retry semantics for notification dispatch.Trigger.dispatchContainerForEventnow optimistically callsthis.trigger(container)directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New/api/notifications/outboxREST surface lets operators list entries (?status=filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base methodTrigger.dispatchOutboxEntry(entry)is the worker's delivery hook; subclasses can override. -
Notification outbox UI (commit
feature/v1.5-rc17). NewNotification outboxpage (route/notifications/outbox, nav under Settings) consumes the existing/api/notifications/outboxREST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges.Retryis shown only on dead-letter rows;Discardis available everywhere. Newui/src/services/notification-outbox.tsmirrors the API exactly. -
Cancel queued or in-flight updates (commits
4b79e3ac,79487115).POST /api/operations/:id/cancelnow accepts both queued and in-progress operations. Queued ops are marked failed immediately withlastError: 'Cancelled by operator'(200). In-progress ops are flagged via a newcancelRequestedfield on the operation row and the endpoint returns202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason ascancelledso the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path. -
Global concurrent-update cap (
DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphoreclass inapp/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default0= unlimited — no behavior change on upgrade. Positive integerNmeans at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain inqueuedstatus. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update. -
Health-gate SSE heartbeat (
DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events betweenphase: 'health-gate'andphase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emitsphase: 'health-gate'at a configurable interval (default 10 s).DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.
Changed
- Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as
es-ESinto the locale folder IDs the UI actually loads (for examplees) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-onlycolumn(s)punctuation that triggered Crowdin QA warnings for translated strings. - Shared DataTable column sizing overhaul (commit
596adcd2). All first-party table surfaces now route through the sharedDataTablecomponent with numeric sizing metadata (size,minSize,maxSize,flex,priority,overflow,autoSize) instead of ad-hoc string widths. Tables render a stable<colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw<table>markup or string column widths reappear inui/src. - Watcher dispatch is fully fire-and-forget (commit
5cfa2286).Trigger.runUpdateAvailableSimpleTriggerandrunAcceptedUpdateBatchpreviously awaitedrunAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. NewdispatchAccepted(accepted)helper centralises thevoid runAcceptedContainerUpdates(...).catch(() => undefined)pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information. - Security alert emit is non-blocking inside the update lifecycle (commit
6c5198dd).SecurityGate.maybeEmitHighSeverityAlertwas awaited insideevaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from t...
v1.5.0-rc.19
v1.5.0-rc.19
[1.5.0-rc.19] — 2026-05-12
Added
-
Fleet-aggregate stats subsystem (commits
feature/v1.5-rc17). NewContainerStatsAggregatorpolls each locally-monitored container once per tick (default 10 s) and computes a fleet-wideContainerStatsSummary(total CPU%, total memory, top-N rows). Two new endpoints —GET /api/v1/stats/summaryandGET /api/v1/stats/summary/stream— expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the?touch=falseworkaround) where the widget showed zeros because the per-container cache was never warmed. The legacyGET /api/v1/containers/statsendpoint and the client-sidesummarizeContainerResourceUsagerollup have been removed. -
Per-container update locks (commit
761fb834). New keyedLockManagerprimitive inapp/updates/lock-primitives.tsreplaces the module-levelpLimit(1)that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project forDockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a newgetUpdateLockKeys(container)method. -
Restart recovery for queued and pulling updates (commit
00788b13). Startup reconciliation inapp/store/update-operation.tsis now selective:status=queuedoperations stay queued for the recovery dispatcher to pick up, andphase=pullingrows are reset toqueued(pull is idempotent). All other in-progress phases —prepare,renamed,new-created,old-stopped,new-started,health-gate,rollback-*— remain marked failed because they leave inconsistent state that an operator should review. A newapp/updates/recovery.tsmodule runs once afterregistry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatorylastErrorso they don't sit in the queue forever. -
Notification outbox with retry and dead-letter queue (commits
a9561d93,7d2ef6eb,b215d295,ce26bece). NewnotificationOutboxLokiJS collection (app/store/notification-outbox.ts) and matchingapp/notifications/outbox-worker.tsbackground worker provide durable retry semantics for notification dispatch.Trigger.dispatchContainerForEventnow optimistically callsthis.trigger(container)directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New/api/notifications/outboxREST surface lets operators list entries (?status=filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base methodTrigger.dispatchOutboxEntry(entry)is the worker's delivery hook; subclasses can override. -
Notification outbox UI (commit
feature/v1.5-rc17). NewNotification outboxpage (route/notifications/outbox, nav under Settings) consumes the existing/api/notifications/outboxREST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges.Retryis shown only on dead-letter rows;Discardis available everywhere. Newui/src/services/notification-outbox.tsmirrors the API exactly. -
Cancel queued or in-flight updates (commits
4b79e3ac,79487115).POST /api/operations/:id/cancelnow accepts both queued and in-progress operations. Queued ops are marked failed immediately withlastError: 'Cancelled by operator'(200). In-progress ops are flagged via a newcancelRequestedfield on the operation row and the endpoint returns202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason ascancelledso the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path. -
Global concurrent-update cap (
DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphoreclass inapp/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default0= unlimited — no behavior change on upgrade. Positive integerNmeans at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain inqueuedstatus. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update. -
Health-gate SSE heartbeat (
DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events betweenphase: 'health-gate'andphase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emitsphase: 'health-gate'at a configurable interval (default 10 s).DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.
Changed
- Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as
es-ESinto the locale folder IDs the UI actually loads (for examplees) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-onlycolumn(s)punctuation that triggered Crowdin QA warnings for translated strings. - Shared DataTable column sizing overhaul (commit
596adcd2). All first-party table surfaces now route through the sharedDataTablecomponent with numeric sizing metadata (size,minSize,maxSize,flex,priority,overflow,autoSize) instead of ad-hoc string widths. Tables render a stable<colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw<table>markup or string column widths reappear inui/src. - Watcher dispatch is fully fire-and-forget (commit
5cfa2286).Trigger.runUpdateAvailableSimpleTriggerandrunAcceptedUpdateBatchpreviously awaitedrunAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. NewdispatchAccepted(accepted)helper centralises thevoid runAcceptedContainerUpdates(...).catch(() => undefined)pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information. - Security alert emit is non-blocking inside the update lifecycle (commit
6c5198dd).SecurityGate.maybeEmitHighSeverityAlertwas awaited insideevaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from t...
v1.5.0-rc.18
v1.5.0-rc.18
[1.5.0-rc.18] — 2026-05-09
Added
-
Fleet-aggregate stats subsystem (commits
feature/v1.5-rc17). NewContainerStatsAggregatorpolls each locally-monitored container once per tick (default 10 s) and computes a fleet-wideContainerStatsSummary(total CPU%, total memory, top-N rows). Two new endpoints —GET /api/v1/stats/summaryandGET /api/v1/stats/summary/stream— expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the?touch=falseworkaround) where the widget showed zeros because the per-container cache was never warmed. The legacyGET /api/v1/containers/statsendpoint and the client-sidesummarizeContainerResourceUsagerollup have been removed. -
Per-container update locks (commit
761fb834). New keyedLockManagerprimitive inapp/updates/lock-primitives.tsreplaces the module-levelpLimit(1)that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project forDockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a newgetUpdateLockKeys(container)method. -
Restart recovery for queued and pulling updates (commit
00788b13). Startup reconciliation inapp/store/update-operation.tsis now selective:status=queuedoperations stay queued for the recovery dispatcher to pick up, andphase=pullingrows are reset toqueued(pull is idempotent). All other in-progress phases —prepare,renamed,new-created,old-stopped,new-started,health-gate,rollback-*— remain marked failed because they leave inconsistent state that an operator should review. A newapp/updates/recovery.tsmodule runs once afterregistry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatorylastErrorso they don't sit in the queue forever. -
Notification outbox with retry and dead-letter queue (commits
a9561d93,7d2ef6eb,b215d295,ce26bece). NewnotificationOutboxLokiJS collection (app/store/notification-outbox.ts) and matchingapp/notifications/outbox-worker.tsbackground worker provide durable retry semantics for notification dispatch.Trigger.dispatchContainerForEventnow optimistically callsthis.trigger(container)directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New/api/notifications/outboxREST surface lets operators list entries (?status=filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base methodTrigger.dispatchOutboxEntry(entry)is the worker's delivery hook; subclasses can override. -
Notification outbox UI (commit
feature/v1.5-rc17). NewNotification outboxpage (route/notifications/outbox, nav under Settings) consumes the existing/api/notifications/outboxREST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges.Retryis shown only on dead-letter rows;Discardis available everywhere. Newui/src/services/notification-outbox.tsmirrors the API exactly. -
Cancel queued or in-flight updates (commits
4b79e3ac,79487115).POST /api/operations/:id/cancelnow accepts both queued and in-progress operations. Queued ops are marked failed immediately withlastError: 'Cancelled by operator'(200). In-progress ops are flagged via a newcancelRequestedfield on the operation row and the endpoint returns202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason ascancelledso the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path. -
Global concurrent-update cap (
DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphoreclass inapp/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default0= unlimited — no behavior change on upgrade. Positive integerNmeans at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain inqueuedstatus. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update. -
Health-gate SSE heartbeat (
DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events betweenphase: 'health-gate'andphase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emitsphase: 'health-gate'at a configurable interval (default 10 s).DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.
Changed
- Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as
es-ESinto the locale folder IDs the UI actually loads (for examplees) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-onlycolumn(s)punctuation that triggered Crowdin QA warnings for translated strings. - Shared DataTable column sizing overhaul (commit
596adcd2). All first-party table surfaces now route through the sharedDataTablecomponent with numeric sizing metadata (size,minSize,maxSize,flex,priority,overflow,autoSize) instead of ad-hoc string widths. Tables render a stable<colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw<table>markup or string column widths reappear inui/src. - Watcher dispatch is fully fire-and-forget (commit
5cfa2286).Trigger.runUpdateAvailableSimpleTriggerandrunAcceptedUpdateBatchpreviously awaitedrunAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. NewdispatchAccepted(accepted)helper centralises thevoid runAcceptedContainerUpdates(...).catch(() => undefined)pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information. - Security alert emit is non-blocking inside the update lifecycle (commit
6c5198dd).SecurityGate.maybeEmitHighSeverityAlertwas awaited insideevaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from t...