Skip to content

feat(scripts): automated health checks with HTML dashboard and Telegram alerts#1530

Open
hodgesz wants to merge 5 commits intoNVIDIA:mainfrom
hodgesz:feat/health-check-clean
Open

feat(scripts): automated health checks with HTML dashboard and Telegram alerts#1530
hodgesz wants to merge 5 commits intoNVIDIA:mainfrom
hodgesz:feat/health-check-clean

Conversation

@hodgesz
Copy link
Copy Markdown

@hodgesz hodgesz commented Apr 6, 2026

Summary

  • scripts/health-check.sh — probes all critical NemoClaw services (Docker, gateway, sandbox, SSH tunnel, inference endpoints, dashboard port forward, Telegram bridge, openclaw doctor) and reports pass/fail status. Supports pretty, JSON, and quiet output modes. Optional --alert telegram sends notifications only on failure.
  • scripts/health-check-html.sh — generates a self-contained HTML status dashboard from the health check JSON output. Auto-refreshes every 60s, works on macOS and Linux, no dependencies beyond bash and curl.
  • scripts/telegram-bridge.js — adds /status command so users can check service health directly from Telegram.
  • docs/monitoring/automated-health-checks.md — full documentation with usage, scheduling (launchd/cron/systemd), and configuration reference.

Motivation

Always-on NemoClaw assistants have multiple interdependent services that can silently fail after reboots, Docker restarts, or network changes. There is currently no way to detect these failures until something stops working. This adds a host-side observability layer that complements the existing nemoclaw status and openshell term tools.

Relates to #233 (observability/metrics proposal) and #1430 (Docker HEALTHCHECK).

Test plan

  • Run ./scripts/health-check.sh and verify all checks report correctly
  • Run ./scripts/health-check.sh --json and verify valid JSON output
  • Run ./scripts/health-check.sh --quiet and verify exit code matches health state
  • Run ./scripts/health-check.sh --skip docker,ssh and verify those checks are skipped
  • Run ./scripts/health-check-html.sh --open and verify HTML page renders with correct counts
  • Stop a service (e.g. openshell forward stop) and verify the check detects it
  • Set TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID and run with --alert telegram to verify alerting
  • Send /status to the Telegram bot and verify it replies with health results

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Automated health checks for NemoClaw services with selectable targets, skip options, and three output modes (pretty, JSON, quiet).
    • Self-contained HTML status dashboard generation from health-check results with optional auto-open.
    • Messaging-platform status command to request formatted health reports and optional Telegram alerts.
  • Documentation

    • New guide covering usage, scheduling examples (launchd/cron/systemd), output modes, alert setup, CLI flags, and exit codes.

…legram /status

Add host-side service health monitoring for NemoClaw deployments:

- scripts/health-check.sh: probes Docker, gateway, sandbox, SSH tunnel,
  inference endpoints, dashboard, Telegram bridge, and openclaw doctor.
  Supports pretty, JSON, and quiet output modes. Optional --alert telegram
  sends notifications only on failure.
- scripts/health-check-html.sh: generates a self-contained HTML status
  dashboard from health-check.sh JSON output. Auto-refreshes every 60s.
- scripts/telegram-bridge.js: add /status command to check service health
  directly from Telegram.
- docs/monitoring/automated-health-checks.md: full documentation with
  usage, scheduling (launchd/cron/systemd), and configuration reference.

Relates to NVIDIA#233 (observability proposal) and NVIDIA#1430 (Docker HEALTHCHECK).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a health-check system: a new CLI that probes NemoClaw components, an HTML status generator, Telegram /status integration in the bridge, and documentation covering usage, scheduling, output modes, alerts, and exit codes. (48 words)

Changes

Cohort / File(s) Summary
Health Check Core
scripts/health-check.sh, scripts/health-check-html.sh
Adds health-check.sh (ordered probes: docker, gateway, sandbox, SSH, inference endpoints, dashboard port-forward, Telegram bridge liveness, in-sandbox openclaw doctor), CLI flags (--sandbox, --gateway, --port, --alert, --json, --quiet, --skip), JSON/pretty/quiet output modes, and exit codes (0/1/2). Adds health-check-html.sh to convert JSON output into a self-contained HTML dashboard (--output, --open) and forwards args to health-check.sh.
Documentation
docs/monitoring/automated-health-checks.md
New published doc describing probe commands, execution patterns (all, --sandbox, --skip), output modes (pretty, --json, --quiet), --alert telegram usage and required env vars, scheduling examples (launchd, cron, systemd guidance), HTML generation, Telegram /status retrieval, CLI flags, and exit codes.
Telegram Bridge Integration
scripts/telegram-bridge.js
Adds /status handling: sends typing action, runs health-check.sh --sandbox <SANDBOX> --json (60s timeout), parses JSON, formats per-check icons and summary, replies to the user; on execution/parsing error replies with an error message and continues processing.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Bridge as Telegram Bridge
    participant HealthCheck as ./scripts/health-check.sh
    participant Probes as Service Probes
    participant TelegramAPI as Telegram API

    User->>Bridge: /status
    Bridge->>TelegramAPI: sendChatAction(typing)
    Bridge->>HealthCheck: exec --sandbox <SANDBOX> --json
    HealthCheck->>Probes: run probes (docker, gateway, sandbox, SSH, inference, dashboard, bridge, openclaw)
    Probes-->>HealthCheck: JSON results
    HealthCheck-->>Bridge: stdout JSON
    Bridge->>TelegramAPI: reply(formatted status)
    TelegramAPI-->>User: status message
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hop through ports and shelly lands,

I nudge the daemons with tiny hands.
JSON beeps, the HTML shines bright,
Telegram hums through moonlit night.
I guard the services till morning light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.77% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes: adding automated health checks, an HTML dashboard, and Telegram alerts to scripts. It is concise, specific, and clearly conveys the primary additions to the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch feat/health-check-clean

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (4)
docs/monitoring/automated-health-checks.md (4)

196-200: Use the required Next Steps section title here.

This bottom section already has the right content, so renaming the heading is enough to match the docs template.

As per coding guidelines, A "Next Steps" section at the bottom links to related pages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/monitoring/automated-health-checks.md` around lines 196 - 200, Rename
the bottom heading currently titled "Related Topics" to the required "Next
Steps" in the automated-health-checks document; keep the existing link list
(Monitor Sandbox Activity, Recovery After Reboot, Troubleshooting) unchanged and
only update the header text so the section matches the docs template.

2-4: Make the H1 match title.page.

The frontmatter title is Automated Health Checks for NemoClaw Services, but the page heading drops for NemoClaw Services.

As per coding guidelines, "H1 heading matches the title.page frontmatter value."

Also applies to: 23-23

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/monitoring/automated-health-checks.md` around lines 2 - 4, The page H1
does not match the frontmatter title.page "Automated Health Checks for NemoClaw
Services"; update the top-level heading in the markdown body to exactly match
the frontmatter value (including "for NemoClaw Services") so the H1 equals
title.page, and repeat the same change for the other occurrence referenced (line
23) to ensure consistency.

27-39: Add lead-in sentences before the table/code-first sections.

What It Checks, Running on a Schedule, Linux (cron), Configuration, and Exit Codes jump straight into a table or code block. The docs template expects each H2/H3 section to open with a short introductory sentence.

As per coding guidelines, "Sections use H2 and H3, each starting with an introductory sentence."

Also applies to: 106-108, 144-150, 176-195

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/monitoring/automated-health-checks.md` around lines 27 - 39, Add a
one-sentence lead-in before any table or code-first sections so each H2/H3 opens
with an introductory sentence; specifically insert a short intro paragraph
before the "What It Checks" table and do the same for the "Running on a
Schedule", "Linux (cron)", "Configuration", and "Exit Codes" sections (and any
other sections noted in the review) so the section heading is followed by a
brief explanatory sentence before the table or code block.

25-25: Tighten the narrative prose formatting.

Lines 25, 104, 154, 164, and 174 put multiple sentences on one source line; Lines 104 and 170 use passive voice; and Line 154 uses robust, which the style guide flags as marketing language.

As per coding guidelines, "One sentence per line in source (makes diffs readable).", "Active voice required. Flag passive constructions.", and "Superlatives and marketing language ('powerful,' 'robust,' 'seamless,' 'cutting-edge'). Say what it does, not how great it is."

Also applies to: 104-104, 154-154, 164-164, 170-170, 174-174

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/monitoring/automated-health-checks.md` at line 25, Split any source
lines that contain multiple sentences in automated-health-checks.md so each
sentence appears on its own line (improves diffs); convert passive constructions
to active voice (e.g., rephrase sentences that say "is/was X" to "the script X
does Y") in the sentences referencing scripts/health-check.sh and surrounding
descriptions; and remove the marketing word "robust"—replace it with a concrete
description of what the check actually verifies (e.g., "checks service
responsiveness and exit codes"). Focus edits on the lines flagged (the sentences
mentioning scripts/health-check.sh and nearby explanatory lines) and keep
meaning unchanged while applying one-sentence-per-line and active-voice rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/health-check-html.sh`:
- Around line 78-80: The script currently injects unescaped values (SANDBOX,
name, detail) into the generated HTML which allows HTML/script injection; add an
html_escape function (as described in the review) near the parsing section and
call it whenever writing SANDBOX, name, detail (and any other dynamic fields
parsed around lines 100-105 and 144-146) into the HTML output so that &, <, >, "
and ' are replaced with their HTML entities before output; update usages where
name and detail are set/printed to use the escaped value (e.g.,
escaped_name=$(html_escape "$name") and escaped_detail=$(html_escape "$detail"))
and write those escaped variables to the file.

In `@scripts/health-check.sh`:
- Around line 62-64: The check_gateway logic is producing false positives
because it uses substring matches and only echoes GATEWAY_NAME in messages;
update the check_gateway function to perform exact matches for both gateway name
and status: validate that GATEWAY_NAME actually exists in the gateway list (not
just echoed) and when checking status use precise comparison (e.g., full-field
equality or word-boundary matching instead of substring) so "healthy" doesn't
match "unhealthy" and "connected" doesn't match "disconnected"; ensure the
checks in the block handling --gateway (GATEWAY_NAME and the status checks
around lines 135-149) use exact matching and proper field extraction (e.g., awk
or grep -w on the gateway name and grep/^pattern$/ or string equality for
status) before reporting healthy.
- Around line 158-166: The current check greps for "ready" before checking
failures, so outputs like "sandbox is not ready" wrongly pass; update the logic
that inspects sandbox_out (variable sandbox_out from the openshell sandbox get
call) to first detect failure cases — e.g., check for "not found|does not exist"
and then explicitly check for "not ready" (grep -qi 'not ready') before testing
for "ready" — and use the existing record function with SANDBOX_NAME to mark the
correct pass/fail outcomes.
- Around line 56-94: The script currently swallows unknown arguments in the
while/case block (the default "*) shift ;;" branch), which hides typos; update
that default case to print an error to stderr (e.g. "Unknown option: $1"), show
usage/help (reuse the existing sed -n '2,/^$/s/^# *//p' "$0" or a short hint),
and exit non‑zero (exit 1) instead of shifting so the script fails fast; change
the default pattern in the case statement that currently contains "*) shift ;;"
to perform these actions.
- Around line 237-249: The health check in check_bridge currently treats
TELEGRAM_BOT_TOKEN alone as reason to expect the telegram-bridge, causing false
failures; change the prerequisite check to match scripts/start-services.sh by
expecting the bridge only when both TELEGRAM_BOT_TOKEN and NVIDIA_API_KEY are
set (i.e., in check_bridge use a condition that requires both TELEGRAM_BOT_TOKEN
and NVIDIA_API_KEY before recording "fail" for missing PID), keeping the
existing PID file logic (piddir/pidfile and kill -0) and the skip branch for no
token/API key.
- Around line 303-319: Add a json_escape helper that escapes backslashes, double
quotes, newlines, carriage returns and tabs (e.g., json_escape function) and
call it when emitting any user/command data in the JSON branch: escape
$SANDBOX_NAME and $GATEWAY_NAME before printing the "sandbox" and "gateway"
fields and escape the per-check detail (from RESULTS loop) before inserting into
the "detail" value; replace direct uses of "$sandbox"/"$gateway"/"$detail" in
the printf/echo with the escaped values so generated JSON remains valid for
scripts like scripts/telegram-bridge.js and scripts/health-check-html.sh.
- Around line 258-272: The health-check currently treats a non-zero openclaw
doctor exit as healthy; change the branching in the doctor_out/doctor_exit logic
so you first test doctor_exit (the exit code parsed into the variable
doctor_exit) and if it is non-zero record a failure via record "agent" "fail"
with a concise message (e.g., include the first 80 chars of doctor_out or a
trimmed summary) before falling back to the grep of error|fail|unhealthy; keep
the existing grep-based failure as a more specific message but ensure the
non-zero doctor_exit path always records fail, using the existing variables
doctor_out and doctor_exit and the record function to emit the failure.

---

Nitpick comments:
In `@docs/monitoring/automated-health-checks.md`:
- Around line 196-200: Rename the bottom heading currently titled "Related
Topics" to the required "Next Steps" in the automated-health-checks document;
keep the existing link list (Monitor Sandbox Activity, Recovery After Reboot,
Troubleshooting) unchanged and only update the header text so the section
matches the docs template.
- Around line 2-4: The page H1 does not match the frontmatter title.page
"Automated Health Checks for NemoClaw Services"; update the top-level heading in
the markdown body to exactly match the frontmatter value (including "for
NemoClaw Services") so the H1 equals title.page, and repeat the same change for
the other occurrence referenced (line 23) to ensure consistency.
- Around line 27-39: Add a one-sentence lead-in before any table or code-first
sections so each H2/H3 opens with an introductory sentence; specifically insert
a short intro paragraph before the "What It Checks" table and do the same for
the "Running on a Schedule", "Linux (cron)", "Configuration", and "Exit Codes"
sections (and any other sections noted in the review) so the section heading is
followed by a brief explanatory sentence before the table or code block.
- Line 25: Split any source lines that contain multiple sentences in
automated-health-checks.md so each sentence appears on its own line (improves
diffs); convert passive constructions to active voice (e.g., rephrase sentences
that say "is/was X" to "the script X does Y") in the sentences referencing
scripts/health-check.sh and surrounding descriptions; and remove the marketing
word "robust"—replace it with a concrete description of what the check actually
verifies (e.g., "checks service responsiveness and exit codes"). Focus edits on
the lines flagged (the sentences mentioning scripts/health-check.sh and nearby
explanatory lines) and keep meaning unchanged while applying
one-sentence-per-line and active-voice rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 794c19e2-cc8e-4e15-b1ad-7a1d120b9627

📥 Commits

Reviewing files that changed from the base of the PR and between 364969d and 9d5d321.

📒 Files selected for processing (4)
  • docs/monitoring/automated-health-checks.md
  • scripts/health-check-html.sh
  • scripts/health-check.sh
  • scripts/telegram-bridge.js

@wscurran wscurran added NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). observability Use this label to improve NemoClaw logging, metrics, and tracing. Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. labels Apr 6, 2026
@wscurran
Copy link
Copy Markdown
Contributor

wscurran commented Apr 6, 2026

✨ Thanks for submitting this suggestion, which proposes a way to enhance NemoClaw's observability by adding automated health checks, an HTML status dashboard, and Telegram alerts for service failures.


Possibly related open issues:

- health-check.sh: add json_escape(), word-boundary grep, unknown-arg
  error handling, sandbox failure-first ordering, bridge dual-token
  check, agent exit-code failure detection
- health-check-html.sh: add html_escape() for XSS prevention on all
  dynamic values injected into the HTML output
- docs: match H1 to page title, add intro sentences before tables,
  rename Related Topics to Next Steps, one sentence per line

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
scripts/health-check.sh (1)

151-168: ⚠️ Potential issue | 🟠 Major

check_gateway does not verify the specified GATEWAY_NAME exists.

The function uses word-boundary matching for status keywords (addressing substring false positives), but it still doesn't validate that GATEWAY_NAME actually appears in the openshell status output. The check reports success based on any gateway being healthy/connected, not specifically the one specified via --gateway.

🔧 Suggested fix to verify the gateway name
 check_gateway() {
   if should_skip "gateway"; then record "gateway" "skip" ""; return; fi
   if ! command -v openshell >/dev/null 2>&1; then
     record "gateway" "fail" "openshell CLI not found"
     return
   fi
   local status_out
   status_out="$(openshell status 2>&1 || true)"
+  # Verify the specified gateway exists in the output
+  if ! echo "$status_out" | grep -qw "$GATEWAY_NAME"; then
+    record "gateway" "fail" "Gateway '$GATEWAY_NAME' not found"
+    return
+  fi
   # Use word-boundary matching to avoid "unhealthy" matching "healthy"
   # and "disconnected" matching "connected".
-  if echo "$status_out" | grep -qiw "healthy"; then
+  if echo "$status_out" | grep -w "$GATEWAY_NAME" | grep -qiw "healthy"; then
     record "gateway" "pass" "Gateway '$GATEWAY_NAME' healthy"
-  elif echo "$status_out" | grep -qiw "connected"; then
+  elif echo "$status_out" | grep -w "$GATEWAY_NAME" | grep -qiw "connected"; then
     record "gateway" "pass" "Gateway '$GATEWAY_NAME' connected"
   else
     record "gateway" "fail" "Gateway not healthy"
   fi
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/health-check.sh` around lines 151 - 168, The check_gateway function
currently treats any healthy/connected gateway as success; update check_gateway
to first filter the openshell status output for lines containing the specific
GATEWAY_NAME (use a safe literal match, e.g. grep -F -- "$GATEWAY_NAME") and if
no matching line is found call record "gateway" "fail" "Gateway '$GATEWAY_NAME'
not found"; otherwise run your existing word-boundary checks (grep -qiw
"healthy" / grep -qiw "connected") against only the matched line(s) and record
pass/fail accordingly; keep using the existing record messages and preserve
quoting and the openshell status capture (status_out="$(openshell status 2>&1 ||
true)"), but apply the gateway-name filter before testing healthy/connected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/health-check.sh`:
- Around line 111-120: The json_escape function currently uses echo to output
the escaped string which can misinterpret leading hyphenated sequences (e.g.,
-n, -e) as flags; update the function json_escape to use printf '%s' "$s" (or
equivalent printf invocation) instead of echo "$s" so the escaped string is
printed literally and safely without flag interpretation.

---

Duplicate comments:
In `@scripts/health-check.sh`:
- Around line 151-168: The check_gateway function currently treats any
healthy/connected gateway as success; update check_gateway to first filter the
openshell status output for lines containing the specific GATEWAY_NAME (use a
safe literal match, e.g. grep -F -- "$GATEWAY_NAME") and if no matching line is
found call record "gateway" "fail" "Gateway '$GATEWAY_NAME' not found";
otherwise run your existing word-boundary checks (grep -qiw "healthy" / grep
-qiw "connected") against only the matched line(s) and record pass/fail
accordingly; keep using the existing record messages and preserve quoting and
the openshell status capture (status_out="$(openshell status 2>&1 || true)"),
but apply the gateway-name filter before testing healthy/connected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8ae53b31-9ffb-4d19-8d55-1d3a29b9e927

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5d321 and 5f7c81f.

📒 Files selected for processing (3)
  • docs/monitoring/automated-health-checks.md
  • scripts/health-check-html.sh
  • scripts/health-check.sh
✅ Files skipped from review due to trivial changes (2)
  • scripts/health-check-html.sh
  • docs/monitoring/automated-health-checks.md

…tation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/health-check.sh (1)

262-263: Minor: PID file read twice creates a small race window.

The PID file is read twice—once for the kill -0 check and once for the message. If the bridge restarts in between, the displayed PID may not match the checked one. Storing the PID in a variable avoids this.

♻️ Suggested fix
-  if [ -f "$pidfile" ] && kill -0 "$(cat "$pidfile")" 2>/dev/null; then
-    record "bridge" "pass" "Telegram bridge running (PID $(cat "$pidfile"))"
+  local pid
+  if [ -f "$pidfile" ] && pid=$(cat "$pidfile") && kill -0 "$pid" 2>/dev/null; then
+    record "bridge" "pass" "Telegram bridge running (PID $pid)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/health-check.sh` around lines 262 - 263, Read the PID once into a
variable and use that variable for both the kill check and the record message to
avoid the race window; specifically, replace the inline cat "$pidfile" usage in
the if condition with a prior pid="$(cat "$pidfile")" (or similar) and then use
pid in kill -0 and in the record "bridge" "pass" message so the checked PID and
the logged PID are identical; update the block around the if that references
pidfile and the record call accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/health-check.sh`:
- Around line 262-263: Read the PID once into a variable and use that variable
for both the kill check and the record message to avoid the race window;
specifically, replace the inline cat "$pidfile" usage in the if condition with a
prior pid="$(cat "$pidfile")" (or similar) and then use pid in kill -0 and in
the record "bridge" "pass" message so the checked PID and the logged PID are
identical; update the block around the if that references pidfile and the record
call accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 81ca3bd5-0891-4d91-b20b-fa5535d1754c

📥 Commits

Reviewing files that changed from the base of the PR and between 5f7c81f and 07f7413.

📒 Files selected for processing (1)
  • scripts/health-check.sh

openshell embeds color codes in its output (e.g. \e[32mConnected\e[39m),
which breaks word-boundary grep matching and causes false gateway failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/health-check.sh`:
- Around line 365-371: The suggested command in the ALERT_TEXT message should
quote the sandbox name to handle spaces/special chars; update the Run: line
inside the ALERT_TEXT variable so it includes quotes around $SANDBOX_NAME (e.g.,
--sandbox "$SANDBOX_NAME") ensuring the quotes are escaped as needed inside the
ALERT_TEXT string.
- Around line 259-274: The check_bridge function currently reads the PID file
twice via "$(cat "$pidfile")", causing TOCTOU and duplicate IO; change it to
read the PID once into a local variable (e.g., local pid) after confirming the
pidfile exists (using pid="$(cat "$pidfile")"), then use that pid variable for
the kill -0 check and in the record message; update references in check_bridge
(piddir, pidfile, record) so the pidfile is only read once and reused.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d866cf4b-ce49-47c2-b71e-7571971bd1fb

📥 Commits

Reviewing files that changed from the base of the PR and between 07f7413 and 62884de.

📒 Files selected for processing (1)
  • scripts/health-check.sh

Addresses CodeRabbit review: avoid TOCTOU race from double cat on PID
file, and quote $SANDBOX_NAME in the alert's suggested command.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/health-check.sh`:
- Around line 373-376: The Telegram API curl call that sets RESULT (using
TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID and ALERT_TEXT) lacks a timeout and can
hang; update that curl invocation to include a sensible timeout flag (e.g., add
--max-time 3 and/or --connect-timeout 2) so the call times out like the other
curl uses in the script and the script won't block indefinitely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f99be2a7-5699-4c5e-80db-7b72750c2574

📥 Commits

Reviewing files that changed from the base of the PR and between 62884de and 067fecb.

📒 Files selected for processing (1)
  • scripts/health-check.sh

Comment on lines +373 to +376
RESULT=$(curl -s -X POST \
"https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d chat_id="$TELEGRAM_CHAT_ID" \
--data-urlencode "text=${ALERT_TEXT}" 2>&1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a timeout to the Telegram API call.

Unlike other curl calls in the script (which all use --max-time 3), this Telegram API request has no timeout. If the Telegram API is slow or unreachable, the script could hang indefinitely—problematic for cron/launchd/systemd scheduled runs.

🔧 Suggested fix
         RESULT=$(curl -s -X POST \
+          --max-time 10 \
           "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
           -d chat_id="$TELEGRAM_CHAT_ID" \
           --data-urlencode "text=${ALERT_TEXT}" 2>&1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/health-check.sh` around lines 373 - 376, The Telegram API curl call
that sets RESULT (using TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID and ALERT_TEXT)
lacks a timeout and can hang; update that curl invocation to include a sensible
timeout flag (e.g., add --max-time 3 and/or --connect-timeout 2) so the call
times out like the other curl uses in the script and the script won't block
indefinitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. Integration: Telegram Use this label to identify Telegram bot integration issues with NemoClaw. NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). observability Use this label to improve NemoClaw logging, metrics, and tracing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants