Replies: 3 comments
-
|
Thanks for the thoughtful proposal! This aligns well with where Drydock is headed. What fits naturally: Drydock already has container actions (start/stop/restart) shipped since v1.2.0, and our roadmap includes health check gates (v2.1.0) and expanded container operations (v2.2.0). Adding proactive health monitoring with auto-restart sits right in that trajectory. The label-based opt-in you're describing fits perfectly with the existing
And since Drydock already has multi-host awareness via the agent architecture, this would work across all connected hosts out of the box — which is exactly the gap you identified with single-host tools. Notifications on auto-heal events would plug directly into the existing trigger system (Slack, webhook, email, etc.) with no extra work. Where I'd push back — log monitoring: The log keyword matching ("dual-trigger") is a significantly larger scope. Parsing container logs reliably means dealing with encoding detection, multiline formats, log rotation, rate limiting, and buffering — it's essentially a log aggregation engine. There are mature, dedicated tools for that (Loki + Promtail, Dozzle, etc.) that do it far better than we could as a side feature. I'd rather keep Drydock focused and let users pair it with a purpose-built log stack. Rough timeline: Health-status event notifications could land as early as v1.6–v1.7 as a new trigger event type. The full auto-heal loop (monitor → restart → verify → notify) fits best alongside the health check gates in v2.1.0. Appreciate you taking the time to write this up — it's a good signal that the multi-host health gap is real. |
Beta Was this translation helpful? Give feedback.
-
|
Perfect, thanks, looking forward. As for log monitoring, it was low priority, I have adjusted healthchecks manually to be log aware, so if keyword appears, container is unhealthy now. Waiting patiently for autoheal feature. 😊 |
Beta Was this translation helpful? Give feedback.
-
|
This is tracked on the roadmap at Phase 9.4. The label-based opt-in ( Health-status event notifications (alerting when a container enters unhealthy state, even before auto-restart is implemented) could land earlier — v1.6–1.7 — as a new trigger event type. The full monitor → restart → verify → notify loop is targeted alongside health check gates in the v2.1.0 window. The log-keyword trigger is out of scope; for that use case, pairing Drydock with a dedicated log tool (Loki, Dozzle) and surfacing the result as a Docker healthcheck is the right pattern, as you've already discovered. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
I would like to propose adding an Auto-Heal engine to Drydock. While Docker can restart containers that crash, it lacks a native mechanism to automatically restart containers that become unhealthy while still running.
Current third-party solutions (like docker-autoheal or docker-surgeon) are often unmaintained, lack multi-host support, or require complex sidecar setups. Since Drydock already has secure access to multiple Docker hosts (via sockets/HTTPS), it is perfectly positioned to act as a centralized self-healing orchestrator.
The Problem
When a container enters an unhealthy state:
Proposed Solution
Integrate a monitoring loop within Drydock that polls the health status of containers across all connected hosts and performs corrective actions (Stop/Start or Restart) based on predefined rules.
Key Features
Why Drydock?
Drydock already possesses the "keys to the kingdom"—it has the connectivity and the GUI. Adding this would turn Drydock from a management dashboard into a proactive stability tool, filling a significant gap in the Docker ecosystem.
Beta Was this translation helpful? Give feedback.
All reactions