Skip to content

Releases: CodesWhat/drydock

v1.5.0-rc.27

24 May 11:10
afee8cb

Choose a tag to compare

v1.5.0-rc.27 Pre-release
Pre-release

v1.5.0-rc.27

Full Changelog: v1.5.0-rc.26...v1.5.0-rc.27

[1.5.0-rc.27] — 2026-05-24

Fixed

  • #289 — Agent-hosted container updates no longer leave an orphaned queued operation row on the controller that the 30-minute TTL sweep force-fails into a misleading "update failed" Pushover/Telegram notification long after the update actually succeeded. A user running drydock in a controller + agent topology reported on rc.25 that an "Update All" of Tautulli on two hosts produced the success notification only for the controller-host container; the agent-host container's success notification was missing and, ~30 minutes later, a second Pushover arrived saying [mediavault] Container Tautulli update failed — Marked failed after exceeding active update TTL (1800000ms) while queued. even though the update had in fact succeeded on the agent. Cause: when the controller queues a container update via createAcceptedContainerUpdateRequest (app/updates/request-update.ts) it mints a controller-side operationId and inserts a queued row; the dispatcher then calls entry.trigger.trigger(entry.container, { operationId }). For containers hosted on an agent the trigger is AgentTrigger, whose trigger(container) previously accepted only the container and discarded the runtimeContext. AgentClient.runRemoteTrigger posted {id, name} to the agent without the operationId, so the agent's /api/triggers/:type/:name endpoint called requestContainerUpdate with no operationId and minted its own row; the agent's dd:update-applied / dd:update-operation-changed events then arrived back at the controller carrying the agent-side id, which the controller routed through toAgentScopedId into a third, agent-scoped row (agent-<name>-<remote-id>). The original controller-side queued row was therefore never touched, sat queued past the UPDATE_OPERATION_ACTIVE_TTL_MS deadline in app/store/update-operation.ts:295-300, and was force-failed by the TTL sweep — which fired the misleading "failed" notification with the row's still-valid container snapshot (hence the correct [mediavault] agent prefix). The fix threads the controller's operationId end-to-end so a single row is the source of truth for the whole lifecycle: AgentTrigger.trigger / triggerBatch now accept and forward runtimeContext; AgentClient.runRemoteTrigger / runRemoteTriggerBatch extract per-container operationIds via the existing getRequestedOperationId helper and include them in the agent payload ({id, name, operationId} for single triggers; {...container, operationId} per entry for batches); the agent-side controller runTrigger accepts an operationId in the request body (validated by triggerRequestBodySchema) and threads it into requestContainerUpdate; the agent-side batch endpoint extracts per-container operationIds into an {operationIds} runtimeContext before forwarding to the local trigger; EnqueueContainerUpdateOptions gains an operationId field honored by createAcceptedContainerUpdateRequest (single-container batches only; multi-container batches still mint per-container UUIDs); and a new AgentClient.resolveAgentOperationId helper checks the controller's operation store for an existing row at the raw (unscoped) id and reuses it when found — falling back to the toAgentScopedId form only when the agent does not echo a known controller id, preserving backwards compatibility with older agents. The controller-side queued row therefore transitions directly to in-progress and succeeded/failed from the agent's lifecycle events, no parallel agent-scoped row is created, the TTL sweep has nothing stale to fail, and the spurious "update failed" notification disappears.

  • #289 — Update-applied and update-failed notification triggers (Pushover, Telegram, etc.) and UI success toasts no longer silently drop for containers running on a connected agent. A user running drydock in a controller + agent topology reported on rc.25 that an "Update All" across two hosts produced the success toast and Pushover notification only for the container on the controller host, never for the same-name container on the agent host. Cause: when the agent finishes an update it sends a dd:update-applied SSE payload to the controller carrying a full container snapshot. The controller's AgentClient.handleEvent routes this through maybeMarkAgentOperationSucceededFromAppliedPayloadmarkAgentOperationTerminalensureAgentOperationForTerminalupdateOperationStore.insertOperation + markOperationTerminal, but buildAgentOperationBase in app/agent/AgentClient.ts constructed the inserted row from {id, kind, containerName, containerId, newContainerId} only — the container snapshot was dropped on the floor. When markOperationTerminal then fired emitTerminalLifecycleEvent (app/store/update-operation.ts), the resulting emitContainerUpdateApplied / emitContainerUpdateFailed payload built by buildTerminalLifecycleEventBase lacked container. The notification handler handleContainerUpdateAppliedEvent (app/triggers/providers/Trigger.ts) then fell back to findContainerByBusinessId(containerName), which compares the agent's bare containerName (e.g. tautulli) against the controller-side fullName (e.g. mediavault_docker_tautulli) and silently dropped — the same class of findContainerByBusinessId miss as #385 but on the agent-scoped operation path that #385 did not cover. The fix threads the agent's container snapshot through every level of the agent-scoped operation pipeline — buildAgentOperationBase, ensureAgentOperationForTerminal, markAgentOperationTerminal, maybeMarkAgentOperationSucceededFromAppliedPayload, and maybeMarkAgentOperationFailedFromFailedPayload — stamping agent: this.name so the controller's view of the container is consistent. The dd:update-operation-changed-before-dd:update-applied race is handled by patching the container snapshot onto the existing active row via updateOperation before the terminal emit runs (only when the existing row lacks a container, never overwriting an existing snapshot). container is added to MutableUpdateOperationFields in app/store/update-operation.ts so terminal and active patches accept it. The store's terminal-lifecycle emit therefore naturally carries the agent's container into emitContainerUpdateApplied / emitContainerUpdateFailed, the payloadContainer shortcut in the trigger handler succeeds, and both the notification trigger and the SSE toast fire end-to-end on the controller for agent-originated updates.

v1.5.0-rc.26

22 May 20:49
512c375

Choose a tag to compare

v1.5.0-rc.26 Pre-release
Pre-release

v1.5.0-rc.26

Full Changelog: v1.5.0-rc.25...v1.5.0-rc.26

[1.5.0-rc.26] — 2026-05-22

Fixed

  • Image reference construction — unanchored /v2 strip could silently corrupt references when the image name contained a /v2 path segment. Registry.getImageFullName and the controller-mode fallback in resolveContainerImageFullName both applied .replace(/\/v2/, '') to the fully concatenated registryUrl/imageName:tag string. Because the regex was unanchored and non-global, if the image name contained a /v2 segment (e.g. library/v2/tool) the strip would remove it from the image name rather than the registry URL — producing a silently wrong reference handed to Trivy. The fix extracts a shared pure helper buildImageReference (app/registries/image-reference.ts) that cleans the registry URL before concatenation using anchored regexes (^https?:\/\/ and /v2\/?$) so the URL scheme and trailing /v2 API path are removed without touching anything in the image name. Both Registry.getImageFullName and the fallback branch of resolveContainerImageFullName now delegate to this helper, eliminating the duplicate logic.

  • #386 — Agents intermittently showing 0 running containers in the controller UI — a second recurrence the rc.25 fix did not close. The rc.25 fix suppressed the authoritative watcher snapshot whenever container enumeration failed or per-container enrichment errors dropped containers, but the recurrence reported on rc.25 is a different failure mode: a cold-start race between the controller's handshake and the agent's first watch cycle. When the controller's AgentClient (re)connects to an agent's SSE stream it handshakes immediately via GET /api/containers; if the agent's watchatstart cron has not yet finished its first run, the agent's in-memory store is still empty and the handshake legitimately receives 0 containers (the agent log shows Handshake successful. Received 0 containers. ~5 s before Cron finished (4 containers watched, 0 errors)). The handshake then fires emitAgentConnected, the UI re-fetches /api/v1/agents, and the agent's running-container count renders 0. When the agent's cron completes moments later it pushes a dd:watcher-snapshot, and AgentClient.handleWatcherSnapshotEvent ingests the four containers into the controller store correctly — but nothing told the UI to refresh, because AgentsView only re-fetches the agent summary on agent-status-changed / connected / resync-required events, not on container-added / container-updated. The stale 0 therefore persisted until an unrelated reconnect event (such as an agent restart) fired. The fix adds a dedicated AgentStatsChanged event: app/event/index.ts gains emitAgentStatsChanged / registerAgentStatsChanged (mirroring the existing AgentConnected pair); AgentClient.handleWatcherSnapshotEvent now emits emitAgentStatsChanged({ agentName }) after every completed watcher snapshot; app/api/sse.ts broadcasts it to UI SSE clients as dd:agent-stats-changed; and ui/src/stores/eventStream.ts maps that to the existing agent-status-changed bus event. A completed agent watch cycle therefore always refreshes the controller's agent-summary count, even when the handshake raced ahead of the agent's first cron.

  • #342 — A container is no longer shown as "update available" with a blank target version after a transient registry error. hasRawUpdate in app/model/container.ts compared transformTag(image.tag.value) against transformTag(result.tag) without guarding an undefined result.tag. When a registry scan failed mid-flight (for example a Docker Hub or GHCR 429) and left a container result present but its tag unset, transformTag(undefined) returned undefined, the localTag !== remoteTag comparison evaluated true, and the container was flagged updateAvailable with an unknown update kind — which the UI renders as an update with no target version (the reporter saw this on immich_redis). hasRawUpdate now performs the tag comparison only when both image.tag.value and result.tag are defined, matching the existing guard in getRawTagUpdate. Digest-only updates are unaffected: a container with an undefined result.tag but a genuine digest change still reports the digest update.

  • #386 follow-through — the controller's agent-summary container count now also refreshes on docker-event-driven container changes, not only completed cron cycles. The initial #386 fix emitted emitAgentStatsChanged from AgentClient.handleWatcherSnapshotEvent, the cron-watch path. An agent also ingests individual container add/remove/update events from the Docker event stream between cron cycles (handleContainerChangeEvent, handleContainerRemovedEvent), via the controller-initiated watch() path, and via the per-container controller-initiated watchContainer() path — none of which emitted the stats-changed signal, so a container started or stopped on an agent host could leave the AgentsView running-container count stale until the next 6-hourly cron. All four paths now emit emitAgentStatsChanged after mutating the controller store, keeping the count current in real time.

  • #342 — GitHub release-notes lookups now survive GitHub's secondary rate limit instead of giving up on the first burst. Drydock authenticates its api.github.com release-notes requests by reusing the configured GHCR token, but a watch cycle still fans out a lookup for every watched container at once and trips GitHub's secondary rate limit — a 403 GitHub returns to authenticated callers who burst too many requests. The shared retry helper (app/registries/http-retry.ts) only retried 429/503, so the secondary-limit 403 was never retried: the provider logged GitHub release notes lookup is rate-limited and returned nothing. withRetry gains two optional, opt-in hooks — retryPredicate (retry a status outside retryableStatuses) and retryDelayMs (per-attempt delay override) — leaving every existing caller unchanged. GithubProvider classifies a 403 as a secondary rate limit only when it carries a retry-after header or x-ratelimit-remaining: 0, retries those (honouring retry-after / x-ratelimit-reset for the delay), and leaves a genuine 403 authorization failure failing fast as before. Once retries are exhausted the provider arms a short module-level cooldown — driven by GitHub's own retry hint, floored at the 60 s default so a retry-after: 0 hint cannot produce an already-expired cooldown — during which further release-notes lookups are skipped, so a single cron cycle no longer hammers an already-tripped limit container after container. The rate-limit warning now also records whether the request was authenticated.

  • #342 — the registry-error tooltip on the Containers view now names the registry that failed. When a registry tag lookup errors (for example a 429 rate limit) the container shows a registry-error badge whose tooltip previously rendered only the raw transport message — Registry error: Request failed with status code 429 — with no indication of which registry was queried. registryErrorTooltip in ui/src/views/ContainersView.vue now derives the registry hostname from the container's registryUrl and renders it through a new registryError.detailWithRegistry i18n string ({registryHost} — {error}), e.g. ghcr.io — Request failed with status code 429. Containers whose registryUrl is absent or unparseable fall back to the original message unchanged.

v1.5.0-rc.25

21 May 17:22
42d5f84

Choose a tag to compare

v1.5.0-rc.25 Pre-release
Pre-release

v1.5.0-rc.25

[1.5.0-rc.25] — 2026-05-21

Fixed

  • #371 — Containers "Group By Stack" view no longer dissolves a multi-container stack into "Ungrouped" while its last container is mid-update. The flatten rule in groupedContainers (ui/src/views/ContainersView.vue) previously keyed off the transient live container count (buckets[key].length === 1). During a docker recreate a 2-container stack momentarily shows only 1 live container (old removed, new not yet added), so the rule fired and dropped the stack header. The fix adds a groupAssignedSizeMap ref (populated by loadGroups() from the groups API response and reset to {} on error) that records each group's API-assigned member count. The flatten condition is now buckets[key].length === 1 && groupAssignedSizeMap.value[key] === 1 — a strict equality check so stacks whose assigned size is > 1 or transiently absent from the API response are never flattened mid-update. Genuine single-container stacks (assigned size exactly 1) are still flattened as before (GitHub Discussion #179).

  • #386 — Agents intermittently showing 0 running containers in the controller UI — a recurrence of #362 that the rc.20 fix did not fully close. The rc.20 guard introduced a containerEnumerationFailed flag in Docker.watch() (app/watchers/providers/docker/Docker.ts) that suppresses the authoritative emitWatcherSnapshot when getContainers() itself throws. However, getContainers() does not throw on per-container enrichment failures: addImageDetailsToContainer() is called for each watched container, and any container whose enrichment throws is caught (.catch(error => return error)) and then silently filtered out by .filter(result => !(result instanceof Error) && result != null). A transient docker / socket-proxy hiccup during image inspect can therefore cause getContainers() to return a short or empty array without throwing — the containerEnumerationFailed guard does not fire, watch() emits an authoritative emitWatcherSnapshot with the degraded container list, and the controller's AgentClient.handleWatcherSnapshotEvent prunes every container not in that list, wiping the agent's view. The agent's own store is preserved because its local prune re-confirms each container via inspect(), which is why the agent kept reporting its containers and a restart's handshake re-synced the controller. The fix extends the snapshot-suppression in two steps: getContainers() now accepts an optional diagnostics out-parameter and writes the number of containers dropped due to enrichment errors into diagnostics.enrichmentErrors; watch() creates and passes this object on every call, logs a Container enumeration degraded warning when the count is non-zero, and suppresses emitWatcherSnapshot whenever either containerEnumerationFailed is true or enumerationDiagnostics.enrichmentErrors > 0. Per-container reports still emit as before; only the authoritative controller-side prune is deferred until a fully clean watch cycle.

  • #385 — Telegram, Pushover, and other notification triggers no longer silently swallow update-applied and update-failed events after a compose recreate or on multi-agent deployments. When an update routed through the operation queue completed, the terminal lifecycle event (update-applied on success, update-failed on failure/rolled-back) was emitted from app/store/update-operation.ts:buildTerminalLifecycleEventBase with only containerName / containerId / operationId on the payload — no container object. Notification handlers in app/triggers/providers/Trigger.ts fell back to findContainerByBusinessId(containerName), which missed during the ~8 s window between the old container being removed and the new one being re-watched after a compose recreate; the handler then dropped the event with a No container found for update-applied event => ignore debug log. This was the same class of race as #355 but for the operation-queue-driven path that bypasses UpdateLifecycleExecutor's direct emit. The fix persists a snapshot of the Container on the operation entry at enqueue time (app/updates/request-update.ts:createAcceptedContainerUpdateRequest) and buildTerminalLifecycleEventBase now forwards that snapshot on the terminal-lifecycle payload — both update-applied and update-failed, closing the race for compose successes and failures alike. The agent SSE wire was also extended to forward the container snapshot end-to-end so multi-agent deployments get the same fix: sanitizeUpdateAppliedPayloadForAgentSse and sanitizeUpdateFailedPayloadForAgentSse in app/agent/api/event.ts include container when present (previously stripped to scalars only), and the controller's AgentClient.parseUpdateFailedEventPayload accepts and decorates an inbound container with the source agent name to mirror the existing applied-path behaviour. The snapshot is internal-only: a new toApiUpdateOperation helper in app/store/update-operation.ts strips it before serialising operations through GET /api/v1/update-operations/:id, GET /api/containers/:id/update-operations, and POST /api/operations/:id/cancel, so container labels and details.env are not exposed to API consumers.

v1.5.0-rc.24

17 May 14:12
520649f

Choose a tag to compare

v1.5.0-rc.24 Pre-release
Pre-release

v1.5.0-rc.24

[1.5.0-rc.24] — 2026-05-17

Changed

  • Translations refreshed from Crowdin (commit 202f3d83). Human translations were synced from Crowdin for the ~110-string rc.23 i18n extraction sweep, updating the 16 non-English locales across the appShell, containerComponents, listViews, sharedComponents, configView, agentsView, and notificationOutboxView namespaces. Strings that were previously falling back to English now render in each locale.

Fixed

  • #370 — Containers list "Version" column again shows the human-readable image tag for floating-tag + digest-watch containers, restoring the #356 fix that rc.20 inadvertently reverted. The rc.20 #342 follow-up (commit b40d3db8) added a visible sha256:… → sha256:… digest pair to the Containers table "Version" cell and card body for all updateKind === 'digest' containers that are not digest-pinned. The intent was to surface the digest transition for hybrid containers where both the tag and the underlying image layer changed simultaneously; however, the change cast too wide a net: it also applied to floating-tag + digest-watch containers (e.g. prom/prometheus:latest, linuxserver/plex with a transform tag) — exactly the rows that #356 fixed to show the human-readable tag instead of raw digest strings. The updateKind === 'digest' && !isDigestPinned branch of the table version cell and card body in ui/src/components/containers/ContainersGroupedViews.vue has been restored to the rc.19 behaviour: the version cell renders c.currentTag as a CopyableTag (with the full digest delta in the cell tooltip), and the card body shows only the update-state badge (with the digest delta in the badge tooltip). The digest transition remains visible through the adjacent "kind" column update-state indicator and the container detail panels. Digest-pinned containers (where isDigestPinned is true) are unaffected and continue to show the sha256:… → sha256:… pair directly in the cell.

  • #374 — Security scans no longer hand Trivy a raw registry v2 API URL, which had caused every scan in controller mode to fail. resolveContainerImageFullName (app/api/container/shared.ts), used by both the security scan scheduler and the container API, falls back to composing the image reference directly from container.image.registry.url whenever the container's registry component is not present in the controller's registry state — the normal situation in controller mode (DD_LOCAL_WATCHER=false), where registries are configured on the agents rather than on the controller. registry.url is stored in the registry v2 API base form (e.g. https://registry-1.docker.io/v2), so the fallback produced references such as https://registry-1.docker.io/v2/dgtlmoon/sockpuppetbrowser:0.0.3; Trivy then interpreted the scheme as a hostname and every scan failed with dial tcp: lookup https. The fallback now mirrors Registry.getImageFullName: it strips the URL scheme and the /v2 path segment and uses an @ separator for digest references, yielding a plain registry-1.docker.io/dgtlmoon/sockpuppetbrowser:0.0.3 reference. Containers whose registry component is available are unaffected — they already resolved through the correct getImageFullName path.

v1.5.0-rc.23

16 May 23:28
fb40b47

Choose a tag to compare

v1.5.0-rc.23 Pre-release
Pre-release

v1.5.0-rc.23

[1.5.0-rc.23] — 2026-05-16

Added

  • Self-update now works when Drydock reaches the Docker daemon over a TCP host, not only through a bind-mounted /var/run/docker.sock (commit fc34ffb9). The self-update helper container — the short-lived container that outlives Drydock to stop the old instance, health-check the replacement, and commit or roll back — was hardcoded to a bind-mounted Unix socket and aborted with Self-update requires the Docker socket to be bind-mounted whenever Drydock's watcher was configured with a TCP host. That is the normal setup when a Docker socket proxy (such as sockguard or docker-socket-proxy) mediates daemon access, so self-update was unavailable for those deployments even though every other container updated correctly. resolveHelperDockerConnection now inspects the watcher's Dockerode connection: a TCP host produces a TCP helper that is attached to Drydock's own Docker network (the container's NetworkMode is cloned so the helper can resolve the proxy by DNS) and receives DD_SELF_UPDATE_DOCKER_HOST / DD_SELF_UPDATE_DOCKER_PORT / DD_SELF_UPDATE_DOCKER_PROTOCOL instead of a socket bind mount; runSelfUpdateController builds a TCP Dockerode client from those variables and skips the socket-only API-version probe and redirect guard. The bind-mounted-socket path is unchanged. When self-update runs through a filtering socket proxy the Drydock container must carry the proxy's ownership label so the helper is permitted to stop and replace it — see content/docs/current/configuration/self-update/index.mdx.

  • The per-container Update button is locked with a Self-update unavailable indicator when Drydock cannot update itself in the current deployment (commit cf777280). A new hard self-update-unavailable update-eligibility blocker is raised for the Drydock self-container when an update is available but self-update can run neither over a bind-mounted socket nor a TCP host — i.e. the watcher uses a Unix socket and /var/run/docker.sock is not present in the container. The blocker locks the per-row Update button with an explanatory tooltip and makes POST /containers/:id/update return 409, instead of the previous behaviour where the button appeared actionable and the update failed mid-flight with a socket error. Deployments that reach Docker over TCP report self-update as available, so the button is unaffected there. The check fails open: when the watcher cannot be resolved the blocker is not raised.

  • i18n coverage extended to the remaining hardcoded UI strings across 28 components (discussion #329, commit 1b65e591). A full audit of all 82 UI components found approximately 110 English strings that bypassed vue-i18n and rendered raw regardless of the active locale. The extraction sweep covers: AppLayout search scopes, group labels, section subtitles, and the five deprecation banner bodies (converted to <i18n-t> so embedded <code> elements stay translatable); ThemeToggle variant names; DataFilterBar view-mode names; DetailPanel size labels (S / M / L); update-kind labels (Major / Minor / Patch / Digest) in ContainerFullPageDetail and ContainerFullPageTabContent, which are now reactive computed maps so locale switches take effect without a page reload; tail/status labels and the stdout/stderr stream-type labels in ContainerLogs; action tooltips and button labels in ContainersGroupedViews; error messages and empty-state fallback labels across the Agents, Config, Registries, Triggers, Watchers, Notifications, NotificationOutbox, and Security views; and the WATCHING watcher-status badge, which was rendering the raw backend enum string. New keys land in en/appShell.json, en/containerComponents.json, en/listViews.json, en/sharedComponents.json, en/configView.json, en/agentsView.json, and en/notificationOutboxView.json. Non-translatable identifiers — the product name "Drydock" and format strings such as spdx-json / cyclonedx-json — are intentionally left as literals. Other locales pick up the new keys via the en fallback immediately and will receive human translations on the next Crowdin sync.

Changed

  • Self-update helper now prefers the bind-mounted Docker socket over a TCP watcher connection (commit aa828d88). The previous resolveHelperDockerConnection logic checked the watcher's TCP modem first, meaning that any deployment where Drydock was configured with a TCP host (e.g. routing through a socket proxy) would always route the helper through that proxy — even when the target container itself had /var/run/docker.sock bind-mounted. For infrastructure updates (dd.update.mode=infrastructure), where the container being replaced is the socket proxy, this is fatal: the helper relies on the proxy being up, but the update stops it. The resolution order is now inverted: findDockerSocketBind runs first, and if the target container carries a socket bind the helper uses that direct socket path regardless of the watcher's TCP configuration. The TCP path is preserved as the fallback for pure socket-less deployments where Drydock reaches Docker exclusively over a remote host.

Fixed

  • Dashboard Host Status widget no longer auto-scrolls to the last host when the host list changes (commit cbe815a6). The full-mode host list used scroll-snap-type: y mandatory with a measured tail spacer. Whenever the host-row set changed — a watcher or agent added, removed, or renamed, or a full-to-compact mode transition — Chromium re-snapped to the last row's snap point, leaving only the final host visible above a large empty gap. The scroll-snap classes (snap-y, snap-mandatory, snap-start), the dynamic tail-spacer element, and the measurement machinery behind it (the onUpdated hook, requestAnimationFrame scheduler, and ResizeObserver-triggered recompute) have all been removed. The content-aware full/compact sizing that keeps whole rows visible was already sufficient; the snapping added no functional value and actively fought the layout on every data change.

  • Dashboard Resource Usage widget minimum height raised so per-container CPU and Memory lists stay visible (commit 59719757). The resource-usage widget's minH was set to 3 grid units (approximately 122 px), which falls below the 180 px threshold at which the per-container lists collapse out of view. The minimum is now 7 grid units (approximately 306 px). applyConstraints clamps any saved layout item that is below the new minimum on load, so existing dashboard configurations with a shrunken resource-usage widget are silently corrected on the next render rather than persisting an unusable layout.

  • AgentClient timers are now cleared when an agent is removed, preventing orphaned timeouts (commit 03bf7211). AgentClient maintains two setTimeout handles — stableConnectionTimer (arms 30 s after the SSE response arrives to reset the backoff counter) and reconnectTimer (fires the next reconnect attempt after the exponential-backoff delay). Neither was cancelled when removeAgent spliced the client out of the manager's list. An agent removed mid-reconnect-cycle or mid-stability-window would keep an armed timer alive indefinitely, potentially triggering a startSse call against a client that was no longer tracked and leaking the associated resources. A new idempotent stop() method on AgentClient cancels both timers and nulls the handles; removeAgent now calls stop() on each matching client before splicing it.

Security

  • TCP Docker host is validated before the self-update controller passes it to Dockerode (commit 441b4358). DD_SELF_UPDATE_DOCKER_HOST was forwarded to Dockerode without sanitization. A new validateTcpDockerHost function rejects values that contain a URL scheme prefix (tcp://, http://, https://, or any <scheme>:// form), a userinfo segment (@), whitespace, or path separators (/ or \), throwing a descriptive error before any network connection is attempted. This prevents an environment variable or compose-file value from inadvertently injecting a path or URL component that Dockerode would interpret in an unexpected way. The validated host and resolved port are also logged at INFO level so the connection target is auditable in the container logs. runSelfUpdateController was additionally refactored to remove a control-flow asymmetry: socket and TCP paths previously diverged into separate Dockerode-construct-and-run blocks; they now share a single tail (disableSocketRedirects remains socket-only).

  • OIDC error logs now redact RFC-1918 IP addresses and absolute filesystem paths (commit 9b79de77). The rc.22 getErrorChainMessage improvement walks error.cause chains up to depth 5 and appends the results to OIDC warn logs, which is the right diagnostic behaviour — but TLS and connection errors in Node/undici frequently include private network addresses (e.g. connect ECONNREFUSED 10.0.0.5:2376) and absolute filesystem paths (e.g. error loading /etc/ssl/certs/ca-bundle.pem) that should not appear in logs shipped to centralised observability systems. sanitizeOidcErrorMessage now applies two additional redaction passes after the existing URL and bearer-token passes: RFC-1918 IPv4 ranges (10.x, 172.16–31.x, 192.168.x) with an optional port are replaced with [internal-addr]; absolut...

Read more

v1.5.0-rc.22

16 May 00:08
0ddf007

Choose a tag to compare

v1.5.0-rc.22 Pre-release
Pre-release

v1.5.0-rc.22

[1.5.0-rc.22] — 2026-05-15

Added

  • All 16 non-English locales now have full key parity with the English source (commits 5e463631, 012dcb83). Two complementary passes bring every translation up to date. The first pass (5e463631) filled gaps in the ten locales that were already mostly translated (de, es, fr, it, nl, pl, pt-BR, tr, zh-CN, zh-TW) — each was missing notificationOutboxView.json entirely and had drifted behind recent string extractions in listViews.json, containerComponents.json, containersView.json, and dashboardView.json (new keys: digestLabel, blockedTag variants, manualUpdateOnly variants, narrowViewportSuffix, autoHiddenBadgeTooltip, queued-update toast variants, recentUpdates.widgetAria). A JSON-breaking typo in de/dashboardView.json (straight quote instead of closing curly quote) was also corrected. The second pass (012dcb83) gave the six stub locales (ar, ja, ko, ru, uk, vi) — which had been scaffolded with English placeholders since rc.20 — a full translation pass across all 13 namespace files plus the new notificationOutboxView.json. Brand names, acronyms, and interpolation placeholders are preserved verbatim; DevOps terminology follows each language's established conventions.

Changed

  • Playwright E2E tests moved to a dedicated workflow file (e2e-playwright.yml) (commit f0989301). OSSF Scorecard's CI-Tests check scores from the github-actions Check Suite conclusion, not individual check-run conclusions. Because every job in ci-verify.yml rolled into a single suite, one failing Playwright assertion would flip the entire suite to failure and cause Scorecard to mark merged PRs as untested — even when all other jobs were green (manifesting as code-scanning alert #43, CI-Tests score 9/10). Each workflow file gets its own Check Suite per commit; isolating Playwright into e2e-playwright.yml means a Playwright failure no longer drags the ci-verify suite down for Scorecard's purposes. Branch protection continues to gate on the "🎭 E2E: Playwright" status check (matched by job name, not workflow file), and release-cut.yml now polls both workflows on the target SHA so releases still require Playwright success.

Fixed

  • #368 — OIDC custom-dispatcher paths (cafile / DD_AUTH_OIDC_*_INSECURE=true) no longer fail with an opaque TypeError: fetch failed on Node 24. Node 24 ships built-in undici 7.21.0 (v1 dispatcher interface) while the app's userland undici@8 (bumped in rc.20) exposes an Agent with the v2 dispatcher interface. The OIDC custom fetch was constructing the v2 Agent from userland undici and passing it as dispatcher to Node's global fetch, which is bound to the built-in undici 7. The v2 Agent's handlers don't satisfy the v1 contract, so the request silently fails — the surface symptom reported by a user upgrading rc.19 → rc.21 against self-signed Authentik with DD_AUTH_OIDC_AUTHENTIK_INSECURE=true. The undici project's Dispatcher1Wrapper bridge (nodejs/undici#4827) covers this mismatch on Node 22 but is absent on Node 24. The fix imports fetch from undici and uses it whenever a custom dispatcher is required (cafile or insecure path) so both halves share the same dispatcher version. The non-insecure code path is unchanged — openid-client continues to use its default fetch when no custom dispatcher is needed. A strict-tsc type error introduced in the same fix (undici's nominal RequestInfo/Response types differ from the lib.dom types that openid-client's CustomFetch is typed against) was resolved by casting through unknown at the boundary using Parameters<typeof undiciFetch> and ReturnType<openidClientLibrary.CustomFetch>; there is no runtime behavior change.

  • OIDC warn logs now surface the full error.cause chain, making TLS and DNS failures actionable (commit 720d99a3). undici's fetch surfaces failures as a generic TypeError: fetch failed; the actionable diagnostic (ENOTFOUND, ECONNREFUSED, UNABLE_TO_VERIFY_LEAF_SIGNATURE, etc.) lives on error.cause, sometimes nested. The previous error sanitizer logged only the top-level message, so issue #368 reached us with only "Unable to initialize OIDC session (fetch failed)" — no indication whether DNS, TLS, or routing was at fault. A new getErrorChainMessage helper walks error.cause up to depth 5, joining parts with and appending [code] when a code property is present; a WeakSet guards against cyclic cause chains. sanitizeOidcErrorMessage now uses it so all OIDC warn logs include the cause chain (still passed through the existing URL and token redaction). This is a forward-only diagnostic improvement with no runtime behavior change for healthy OIDC paths.

  • #362 — SSE reconnect exponential backoff no longer collapses to a flat 1 s loop when the agent is struggling. AgentClient.startSse() previously called this.reconnectAttempts = 0 the instant the axios response headers arrived — before the stream had proven it could stay open. A crash-looping agent, a reverse-proxy with a short upstream idle timeout, or any situation where the SSE stream returned HTTP 200 and then ended almost immediately would cycle as: connect → 200 → reconnectAttempts = 0 → stream ends → scheduleReconnect() (delay = 1 000 ms, attempts → 1) → 1 s later connect → 200 → reconnectAttempts = 0 again — and so on forever. The user who filed #362 saw SSE stream ended. Reconnecting... in their controller logs every ~1.00 s indefinitely, with no escalation. The backoff now only resets after the stream has stayed open for SSE_STABLE_CONNECTION_MS (30 s). A setTimeout is armed when the response arrives and cancelled by scheduleReconnect() if the stream ends or errors before the window expires; streams that end early therefore keep their accumulated reconnectAttempts and the delay continues to double up to the 60 s cap as intended.

v1.5.0-rc.21

15 May 13:23
ae8c6a6

Choose a tag to compare

v1.5.0-rc.21 Pre-release
Pre-release

v1.5.0-rc.21

[1.5.0-rc.21] — 2026-05-15

Added

  • i18n coverage extended to the notification outbox, notification rules, and registry/server status badges (discussion #329). Four UI surfaces still rendered hardcoded English even under the zh-CN locale: NotificationOutboxView (tab labels, table headers, action buttons, toast messages), the notification rule name/description column in NotificationsView, and the connection-status badges in RegistriesView and ServersView. All four are now extracted into the existing t() catalogs (new notificationOutboxView namespace; new entries under notificationsView.rules.*, registriesView.status.*, and serversView.status.*). Rule names and statuses use te() so backend-supplied custom names fall back to the raw string when the catalog has no entry. Translation files for other locales will pick up the new keys on the next Crowdin sync.

  • DD_AGENT_ALLOW_INSECURE_SECRET escape hatch for closed-LAN deployments. rc.20 tightened the agent-secret-over-HTTP check from a warning to a hard error in app/agent/AgentClient.ts. rc.21 introduces DD_AGENT_ALLOW_INSECURE_SECRET=true as an explicit controller-side opt-in for environments (isolated private LANs, air-gapped setups) where the operator accepts that the agent secret travels in cleartext. Default behavior is unchanged — without the flag the boot-time error is still thrown. When the flag is set to exactly true, the error is downgraded to a log.warn on every startup so the security signal is preserved and visible in logs. Any other value (e.g. 1, yes, TRUE) continues to throw. See content/docs/current/configuration/agents/index.mdx for guidance on recommended alternatives (certfile/cafile, reverse proxy TLS termination).

Changed

  • Default watcher cron relaxed from hourly to every 6 hours (#342 follow-up). app/watchers/providers/docker/Docker.ts now defaults cron to 0 */6 * * * (every 6 hours) instead of 0 * * * * (hourly). Hourly polling was the most aggressive default among active 2026 update managers — Diun ships 0 */6 * * *, Watchtower (archived Dec 2025) defaulted to 24 h, and our upstream WUD ships no default at all. With fleets of 20+ containers and image tag lists that paginate to thousands of entries (immich-server is 24+ GHCR pages), hourly polling saturates anonymous Docker Hub limits (100 pulls / 6 h) and trips GitHub's 5 k req/h release-notes ceiling. rc.20's per-host token bucket + Retry-After handling stays as the safety net; the default change addresses the root cause. Users who set DD_WATCHER_{name}_CRON explicitly are unaffected. Users who want near-real-time detection (security patches) can still set DD_WATCHER_{name}_CRON=0 * * * *. Docs (content/docs/current/configuration/watchers/index.mdx, content/docs/current/api/watcher.mdx, content/docs/current/api/agent.mdx) updated to reflect the new default.

Fixed

  • discussion #295 — Release-notes icon in the container table now always opens the same popover, even when only an external release URL is available. Previously the file-text icon in the actions column had two different behaviors depending on what release-notes metadata we'd fetched for the container: containers with structured notes (title + body, e.g. from the GitHub release-notes provider) got an icon button that opened a popover with an expandable preview; containers with only a bare releaseLink URL got an icon that was a direct external <a> — no popover. The popover shell now renders uniformly for both cases. When only releaseLink is available, the popover contains a single row that links out to the external URL (with an external-link indicator instead of the chevron used by expandable rows), and clicking the row dismisses the popover before navigating. No change for containers that already have structured notes — the existing popover and inline-expander behavior are unchanged.

  • Docker multi-arch build no longer fails when Alpine repos drift between archs. Dockerfile pinned curl=8.17.0-r1, which broke the linux/arm64 build leg after Alpine's latest-stable/aarch64 mirror rotated to curl-8.19.0-r0 while latest-stable/x86_64 was still on 8.17.0-r1. No single pin can satisfy both archs during a mirror rotation window; the curl entry is now unpinned so apk installs whatever's current per-arch. Other pinned packages still match across both archs and stay pinned.

  • #362DD_SESSION_SECRET no longer crashes startup when unset; secret is auto-generated and persisted to the store on first boot. rc.20 made DD_SESSION_SECRET a hard requirement (commit b9e8be38) to close a real issue — the prior fallback generated a fresh per-process random secret on every restart, which silently invalidated every active session whenever drydock restarted. But the hard-require shipped without a migration path: existing deployments that didn't set the variable hit an immediate boot crash on upgrade. The fallback is now restored as a persisted secret: on first boot without DD_SESSION_SECRET set, drydock generates 64 random bytes (randomBytes(64).toString('hex')) and writes them to a new secrets collection inside /store/dd.json. Subsequent boots read the persisted value, so sessions survive restarts. The env var still takes precedence when set (and whitespace-only values are treated as unset). Operators upgrading from rc.20 with no DD_SESSION_SECRET configured will boot cleanly; deployments that already set the variable see no change.

v1.5.0-rc.20

14 May 18:23
18a31f2

Choose a tag to compare

v1.5.0-rc.20 Pre-release
Pre-release

v1.5.0-rc.20

[1.5.0-rc.20] — 2026-05-14

Added

  • Fleet-aggregate stats subsystem (commits feature/v1.5-rc17). New ContainerStatsAggregator polls each locally-monitored container once per tick (default 10 s) and computes a fleet-wide ContainerStatsSummary (total CPU%, total memory, top-N rows). Two new endpoints — GET /api/v1/stats/summary and GET /api/v1/stats/summary/stream — expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the ?touch=false workaround) where the widget showed zeros because the per-container cache was never warmed. The legacy GET /api/v1/containers/stats endpoint and the client-side summarizeContainerResourceUsage rollup have been removed.

  • Per-container update locks (commit 761fb834). New keyed LockManager primitive in app/updates/lock-primitives.ts replaces the module-level pLimit(1) that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project for Dockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a new getUpdateLockKeys(container) method.

  • Restart recovery for queued and pulling updates (commit 00788b13). Startup reconciliation in app/store/update-operation.ts is now selective: status=queued operations stay queued for the recovery dispatcher to pick up, and phase=pulling rows are reset to queued (pull is idempotent). All other in-progress phases — prepare, renamed, new-created, old-stopped, new-started, health-gate, rollback-* — remain marked failed because they leave inconsistent state that an operator should review. A new app/updates/recovery.ts module runs once after registry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatory lastError so they don't sit in the queue forever.

  • Notification outbox with retry and dead-letter queue (commits a9561d93, 7d2ef6eb, b215d295, ce26bece). New notificationOutbox LokiJS collection (app/store/notification-outbox.ts) and matching app/notifications/outbox-worker.ts background worker provide durable retry semantics for notification dispatch. Trigger.dispatchContainerForEvent now optimistically calls this.trigger(container) directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New /api/notifications/outbox REST surface lets operators list entries (?status= filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base method Trigger.dispatchOutboxEntry(entry) is the worker's delivery hook; subclasses can override.

  • Notification outbox UI (commit feature/v1.5-rc17). New Notification outbox page (route /notifications/outbox, nav under Settings) consumes the existing /api/notifications/outbox REST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges. Retry is shown only on dead-letter rows; Discard is available everywhere. New ui/src/services/notification-outbox.ts mirrors the API exactly.

  • Cancel queued or in-flight updates (commits 4b79e3ac, 79487115). POST /api/operations/:id/cancel now accepts both queued and in-progress operations. Queued ops are marked failed immediately with lastError: 'Cancelled by operator' (200). In-progress ops are flagged via a new cancelRequested field on the operation row and the endpoint returns 202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason as cancelled so the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return 409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path.

  • Global concurrent-update cap (DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphore class in app/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default 0 = unlimited — no behavior change on upgrade. Positive integer N means at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain in queued status. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update.

  • Health-gate SSE heartbeat (DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events between phase: 'health-gate' and phase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emits phase: 'health-gate' at a configurable interval (default 10 s). DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0 disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.

Changed

  • Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as es-ES into the locale folder IDs the UI actually loads (for example es) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-only column(s) punctuation that triggered Crowdin QA warnings for translated strings.
  • Shared DataTable column sizing overhaul (commit 596adcd2). All first-party table surfaces now route through the shared DataTable component with numeric sizing metadata (size, minSize, maxSize, flex, priority, overflow, autoSize) instead of ad-hoc string widths. Tables render a stable <colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw <table> markup or string column widths reappear in ui/src.
  • Watcher dispatch is fully fire-and-forget (commit 5cfa2286). Trigger.runUpdateAvailableSimpleTrigger and runAcceptedUpdateBatch previously awaited runAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. New dispatchAccepted(accepted) helper centralises the void runAcceptedContainerUpdates(...).catch(() => undefined) pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information.
  • Security alert emit is non-blocking inside the update lifecycle (commit 6c5198dd). SecurityGate.maybeEmitHighSeverityAlert was awaited inside evaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from t...
Read more

v1.5.0-rc.19

12 May 00:00
7dcbb5e

Choose a tag to compare

v1.5.0-rc.19 Pre-release
Pre-release

v1.5.0-rc.19

[1.5.0-rc.19] — 2026-05-12

Added

  • Fleet-aggregate stats subsystem (commits feature/v1.5-rc17). New ContainerStatsAggregator polls each locally-monitored container once per tick (default 10 s) and computes a fleet-wide ContainerStatsSummary (total CPU%, total memory, top-N rows). Two new endpoints — GET /api/v1/stats/summary and GET /api/v1/stats/summary/stream — expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the ?touch=false workaround) where the widget showed zeros because the per-container cache was never warmed. The legacy GET /api/v1/containers/stats endpoint and the client-side summarizeContainerResourceUsage rollup have been removed.

  • Per-container update locks (commit 761fb834). New keyed LockManager primitive in app/updates/lock-primitives.ts replaces the module-level pLimit(1) that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project for Dockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a new getUpdateLockKeys(container) method.

  • Restart recovery for queued and pulling updates (commit 00788b13). Startup reconciliation in app/store/update-operation.ts is now selective: status=queued operations stay queued for the recovery dispatcher to pick up, and phase=pulling rows are reset to queued (pull is idempotent). All other in-progress phases — prepare, renamed, new-created, old-stopped, new-started, health-gate, rollback-* — remain marked failed because they leave inconsistent state that an operator should review. A new app/updates/recovery.ts module runs once after registry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatory lastError so they don't sit in the queue forever.

  • Notification outbox with retry and dead-letter queue (commits a9561d93, 7d2ef6eb, b215d295, ce26bece). New notificationOutbox LokiJS collection (app/store/notification-outbox.ts) and matching app/notifications/outbox-worker.ts background worker provide durable retry semantics for notification dispatch. Trigger.dispatchContainerForEvent now optimistically calls this.trigger(container) directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New /api/notifications/outbox REST surface lets operators list entries (?status= filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base method Trigger.dispatchOutboxEntry(entry) is the worker's delivery hook; subclasses can override.

  • Notification outbox UI (commit feature/v1.5-rc17). New Notification outbox page (route /notifications/outbox, nav under Settings) consumes the existing /api/notifications/outbox REST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges. Retry is shown only on dead-letter rows; Discard is available everywhere. New ui/src/services/notification-outbox.ts mirrors the API exactly.

  • Cancel queued or in-flight updates (commits 4b79e3ac, 79487115). POST /api/operations/:id/cancel now accepts both queued and in-progress operations. Queued ops are marked failed immediately with lastError: 'Cancelled by operator' (200). In-progress ops are flagged via a new cancelRequested field on the operation row and the endpoint returns 202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason as cancelled so the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return 409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path.

  • Global concurrent-update cap (DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphore class in app/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default 0 = unlimited — no behavior change on upgrade. Positive integer N means at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain in queued status. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update.

  • Health-gate SSE heartbeat (DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events between phase: 'health-gate' and phase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emits phase: 'health-gate' at a configurable interval (default 10 s). DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0 disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.

Changed

  • Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as es-ES into the locale folder IDs the UI actually loads (for example es) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-only column(s) punctuation that triggered Crowdin QA warnings for translated strings.
  • Shared DataTable column sizing overhaul (commit 596adcd2). All first-party table surfaces now route through the shared DataTable component with numeric sizing metadata (size, minSize, maxSize, flex, priority, overflow, autoSize) instead of ad-hoc string widths. Tables render a stable <colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw <table> markup or string column widths reappear in ui/src.
  • Watcher dispatch is fully fire-and-forget (commit 5cfa2286). Trigger.runUpdateAvailableSimpleTrigger and runAcceptedUpdateBatch previously awaited runAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. New dispatchAccepted(accepted) helper centralises the void runAcceptedContainerUpdates(...).catch(() => undefined) pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information.
  • Security alert emit is non-blocking inside the update lifecycle (commit 6c5198dd). SecurityGate.maybeEmitHighSeverityAlert was awaited inside evaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from t...
Read more

v1.5.0-rc.18

09 May 15:35
102439a

Choose a tag to compare

v1.5.0-rc.18 Pre-release
Pre-release

v1.5.0-rc.18

[1.5.0-rc.18] — 2026-05-09

Added

  • Fleet-aggregate stats subsystem (commits feature/v1.5-rc17). New ContainerStatsAggregator polls each locally-monitored container once per tick (default 10 s) and computes a fleet-wide ContainerStatsSummary (total CPU%, total memory, top-N rows). Two new endpoints — GET /api/v1/stats/summary and GET /api/v1/stats/summary/stream — expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the ?touch=false workaround) where the widget showed zeros because the per-container cache was never warmed. The legacy GET /api/v1/containers/stats endpoint and the client-side summarizeContainerResourceUsage rollup have been removed.

  • Per-container update locks (commit 761fb834). New keyed LockManager primitive in app/updates/lock-primitives.ts replaces the module-level pLimit(1) that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project for Dockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a new getUpdateLockKeys(container) method.

  • Restart recovery for queued and pulling updates (commit 00788b13). Startup reconciliation in app/store/update-operation.ts is now selective: status=queued operations stay queued for the recovery dispatcher to pick up, and phase=pulling rows are reset to queued (pull is idempotent). All other in-progress phases — prepare, renamed, new-created, old-stopped, new-started, health-gate, rollback-* — remain marked failed because they leave inconsistent state that an operator should review. A new app/updates/recovery.ts module runs once after registry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatory lastError so they don't sit in the queue forever.

  • Notification outbox with retry and dead-letter queue (commits a9561d93, 7d2ef6eb, b215d295, ce26bece). New notificationOutbox LokiJS collection (app/store/notification-outbox.ts) and matching app/notifications/outbox-worker.ts background worker provide durable retry semantics for notification dispatch. Trigger.dispatchContainerForEvent now optimistically calls this.trigger(container) directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New /api/notifications/outbox REST surface lets operators list entries (?status= filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base method Trigger.dispatchOutboxEntry(entry) is the worker's delivery hook; subclasses can override.

  • Notification outbox UI (commit feature/v1.5-rc17). New Notification outbox page (route /notifications/outbox, nav under Settings) consumes the existing /api/notifications/outbox REST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges. Retry is shown only on dead-letter rows; Discard is available everywhere. New ui/src/services/notification-outbox.ts mirrors the API exactly.

  • Cancel queued or in-flight updates (commits 4b79e3ac, 79487115). POST /api/operations/:id/cancel now accepts both queued and in-progress operations. Queued ops are marked failed immediately with lastError: 'Cancelled by operator' (200). In-progress ops are flagged via a new cancelRequested field on the operation row and the endpoint returns 202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason as cancelled so the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return 409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path.

  • Global concurrent-update cap (DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphore class in app/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default 0 = unlimited — no behavior change on upgrade. Positive integer N means at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain in queued status. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update.

  • Health-gate SSE heartbeat (DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events between phase: 'health-gate' and phase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emits phase: 'health-gate' at a configurable interval (default 10 s). DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0 disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.

Changed

  • Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as es-ES into the locale folder IDs the UI actually loads (for example es) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-only column(s) punctuation that triggered Crowdin QA warnings for translated strings.
  • Shared DataTable column sizing overhaul (commit 596adcd2). All first-party table surfaces now route through the shared DataTable component with numeric sizing metadata (size, minSize, maxSize, flex, priority, overflow, autoSize) instead of ad-hoc string widths. Tables render a stable <colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw <table> markup or string column widths reappear in ui/src.
  • Watcher dispatch is fully fire-and-forget (commit 5cfa2286). Trigger.runUpdateAvailableSimpleTrigger and runAcceptedUpdateBatch previously awaited runAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. New dispatchAccepted(accepted) helper centralises the void runAcceptedContainerUpdates(...).catch(() => undefined) pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information.
  • Security alert emit is non-blocking inside the update lifecycle (commit 6c5198dd). SecurityGate.maybeEmitHighSeverityAlert was awaited inside evaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from t...
Read more