Skip to content

fix(onboard): refresh provider state on agent changes#3857

Merged
ericksoa merged 6 commits into
mainfrom
fix/onboard-agent-scope-brave-policy
May 21, 2026
Merged

fix(onboard): refresh provider state on agent changes#3857
ericksoa merged 6 commits into
mainfrom
fix/onboard-agent-scope-brave-policy

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented May 20, 2026

Summary

  • refresh saved provider/model/inference state when a resumed onboard switches agents, so Hermes Provider and managed-tool selections cannot leak into an explicit OpenClaw resume
  • keep Hermes Provider options manifest-scoped to Hermes while allowing agent changes to re-run provider selection instead of hard-failing resume
  • stop auto-preserving/applying the built-in Brave policy when Brave search is declined, while keeping Brave selectable and preserving custom/explicit policy choices

Validation

  • npm run build:cli
  • npm test -- src/lib/agent/defs.test.ts test/hermes-provider-foundation.test.ts test/onboard-policy-suggestions.test.ts test/policy-tiers-onboard.test.ts test/onboard-preset-diff.test.ts test/onboard.test.ts

Notes

  • Local full test-cli pre-commit/pre-push coverage was attempted twice and failed in unrelated timeout/CLI-dispatch cases outside this change surface; pushed with SKIP=test-cli after the focused suite and non-test hooks passed.

Summary by CodeRabbit

  • Bug Fixes

    • Switching agents during resume no longer reports a false conflict; agent-scoped state is cleared and provider selection re-runs.
    • Sandbox reuse is blocked when the agent changes.
    • Built-in Brave search preset is excluded by default when web search is not configured.
  • New Features

    • Resume flow now force-refreshes provider and policy selection when agent type changes.
    • Background model-router processes are detected and stopped when switching agents; session step statuses are reset accordingly.
  • Tests

    • Added/updated tests covering agent-resume behavior, Brave preset handling, and related onboarding flows.

Review Change Stack

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

📝 Walkthrough

Walkthrough

Detect agent changes during resume to clear agent-scoped session state and stop tracked model-router processes; treat built-in "brave" as stale when webSearchConfig is absent, filtering it from suggestions and carry-forward logic. Tests added/updated across onboarding and policy tiers.

Changes

Resume Agent Change & Brave Preset Handling

Layer / File(s) Summary
Agent Definition & Resolution Tests
src/lib/agent/defs.test.ts
OpenClaw manifest now asserts inferenceProviderOptions is an empty array; new test ensures explicit agentFlag overrides NEMOCLAW_AGENT.
Stale Brave Preset Detection & Tests
src/lib/onboard/policy-selection.ts, test/onboard-policy-suggestions.test.ts, test/onboard-preset-diff.test.ts, test/policy-tiers-onboard.test.ts
Adds isStaleBuiltinBravePolicyPreset; filters builtin brave when webSearchConfig is absent; updates unit and integration tests to expect removal unless explicitly enabled.
Agent-Scoped Resume State Helpers & Tests
src/lib/onboard/agent-resume-state.ts, test/onboard.test.ts
New helpers: normalizeAgentNameForResumeState, resetStepForAgentChange, clearAgentScopedResumeState; tests extended to validate clearing provider/auth/model/router/session step states when resuming into a different agent.
Model Router Lifecycle Helpers
src/lib/onboard/model-router-process.ts
New router helpers: health check, PID liveness check, graceful shutdown with SIGTERM→SIGKILL escalation, and stopTrackedModelRouterForAgentChange wrapper.
Onboard Resume Integration & Sandbox Decision
src/lib/onboard.ts
Integrates agent-change detection into resume: sets resumeAgentChanged, forces provider selection, calls stopTrackedModelRouterForAgentChange, invokes clearAgentScopedResumeState, prevents sandbox reuse when agent changed, removes agent mismatch from resume conflicts, and refreshes policy carry-forward behavior. Exports clearAgentScopedResumeState.

Sequence Diagram(s)

sequenceDiagram
  participant ResumeHandler as Onboard Resume
  participant AgentCompare as Agent Comparison
  participant RouterMgr as stopTrackedModelRouterForAgentChange
  participant StateClear as clearAgentScopedResumeState
  participant ConflictCheck as getResumeConfigConflicts
  participant SandboxLogic as Sandbox Reuse Logic

  ResumeHandler->>AgentCompare: compare normalized recordedAgent vs selectedAgent
  AgentCompare-->>ResumeHandler: resumeAgentChanged flag
  ResumeHandler->>RouterMgr: stop tracked model-router if pid present
  ResumeHandler->>StateClear: clear agent-scoped resume/session state
  StateClear-->>ResumeHandler: updated Session
  ResumeHandler->>ConflictCheck: recompute resume config conflicts
  ConflictCheck-->>ResumeHandler: return conflicts (agent mismatch not a conflict)
  ResumeHandler->>SandboxLogic: evaluate reuse (requires !resumeAgentChanged)
  SandboxLogic-->>ResumeHandler: decide recreate or reuse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#1464: Both PRs adjust onboarding/policy selection behavior around the built-in brave preset and webSearchConfig presence.

Suggested reviewers

  • jyaunches
  • cv

Poem

🐰
When agents swap mid-resume at dawn,
I tidy sessions fore and yon.
Brave fades when its keys don't gleam,
Routers sleep and steps redeem,
Resume hops fresh across the lawn.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main objective of the PR: refreshing provider state when agents change during resume.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/onboard-agent-scope-brave-policy

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/lib/onboard.ts Fixed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

E2E Advisor Recommendation

Required E2E: onboard-resume-e2e, model-router-provider-routed-inference-e2e, brave-search-e2e
Optional E2E: hermes-e2e, double-onboard-e2e, cloud-onboard-e2e

Dispatch hint: onboard-resume-e2e,brave-search-e2e

Auto-dispatched E2E: onboard-resume-e2e, brave-search-e2e via nightly-e2e.yaml at f1e26dbc7ed4d1855beeea89ead3257b9c2af33enightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • onboard-resume-e2e (high): Directly exercises interrupted onboard followed by onboard --resume, including cached step reuse and policy completion. This is the closest existing E2E coverage for the changed resume/session-state logic.
  • model-router-provider-routed-inference-e2e (high): Required because model-router process health/stop helpers were moved and onboarding now calls them during agent-change resume handling. This existing guard verifies provider-routed Model Router onboard produces a working inference.local route.
  • brave-search-e2e (high): Required because policy-selection behavior for the built-in Brave preset changed. This existing E2E verifies Brave web-search onboarding applies the Brave policy, wires OpenClaw config, avoids leaking the key, and performs real Brave access through the sandbox.

Optional E2E

  • hermes-e2e (high): Useful adjacent confidence for multi-agent onboarding because the PR changes agent selection precedence and agent-scoped resume state. It validates a fresh Hermes install/onboard/health/live-inference path, but does not specifically exercise changing agents during resume.
  • double-onboard-e2e (high): Useful lifecycle confidence for repeated onboarding, gateway reuse, and sandbox recreation after changes in createSandbox policy carry-forward and resume/reuse decisions.
  • cloud-onboard-e2e (high): Useful baseline install/onboard confidence for non-interactive OpenClaw cloud onboarding after policy-selection and onboarding flow changes, though it uses custom npm,pypi policy presets rather than the Brave/default-preservation path.

New E2E recommendations

  • agent-change-resume (high): No existing E2E appears to explicitly start onboarding with one agent, interrupt or retain a resumable session, then resume with a different agent and assert provider/model/router/policy state is cleared, the tracked model-router is stopped, and the sandbox is revalidated/recreated for the new agent.
    • Suggested test: Add an E2E scenario for onboard --resume --agent <different-agent> that transitions OpenClaw↔Hermes and verifies cleared provider/model/policy state, no stale router process, and correct final sandbox agent metadata.
  • stale-brave-policy-removal (medium): Existing brave-search-e2e covers the positive Brave-enabled path. The PR also changes the negative/removal path: stale built-in Brave should not be preserved when web search is not configured, while custom presets named brave must still be preserved.
    • Suggested test: Add an E2E or scenario-suite assertion for re-onboard/resume without BRAVE_API_KEY after a sandbox has the built-in brave policy, verifying the built-in Brave network policy is removed and custom presets are not incorrectly dropped.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: onboard-resume-e2e,brave-search-e2e

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 4478-4523: The new resume/preset flow (functions
resetStepForAgentChange and clearAgentScopedResumeState) should be moved out of
the large core onboarding file into a dedicated onboard helpers module: create a
new module exposing resetStepForAgentChange(session, stepName) and
clearAgentScopedResumeState(session, selectedAgentName) and import them where
used; preserve all logic (normalization via normalizeSandboxAgentName, session
field resets, resetSteps array and lastCompletedStep/lastStepStarted
adjustments) and update callers to use the new helpers so the core onboarding
file no longer grows with this agent-change/session-reset logic.
- Around line 9420-9428: When switching agents in the resume branch, stop the
tracked model-router process before clearing routed session state so we don't
leave a live router with discarded credentials; detect the router PID/credential
fields on the current session and call the routine that stops the tracked router
(e.g. stopTrackedModelRouter or the existing stopModelRouter helper) prior to
calling onboardSession.updateSession and clearAgentScopedResumeState; ensure
routerPid and routerCredentialHash are cleared only after the router has been
stopped so reconcileModelRouter() won’t encounter a healthy endpoint with
unknown credentials.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d2d83640-1f40-4b82-b81b-0970e85c110d

📥 Commits

Reviewing files that changed from the base of the PR and between 11b1937 and cb653bb.

📒 Files selected for processing (7)
  • src/lib/agent/defs.test.ts
  • src/lib/onboard.ts
  • src/lib/onboard/policy-selection.ts
  • test/onboard-policy-suggestions.test.ts
  • test/onboard-preset-diff.test.ts
  • test/onboard.test.ts
  • test/policy-tiers-onboard.test.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26142155076
Target ref: cb653bbc9bda834cfc5dada51ac709db0dde3afd
Workflow ref: main
Requested jobs: onboard-resume-e2e,network-policy-e2e,brave-search-e2e,hermes-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job Result
brave-search-e2e ✅ success
hermes-e2e ✅ success
network-policy-e2e ✅ success
onboard-resume-e2e ✅ success

Copy link
Copy Markdown
Contributor

@jyaunches jyaunches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix — the agent-change resume behavior and Brave policy handling both look directionally correct, and the advisor-required E2Es passed against cb653bbc9bda834cfc5dada51ac709db0dde3afd (onboard-resume-e2e, network-policy-e2e, brave-search-e2e, hermes-e2e).

One blocker before merge:

🔴 Monolith growth / failing budget check: this PR grows src/lib/onboard.ts from 10332 → 10402 lines (+70), and the required onboard-entrypoint-budget check is failing. Please move the new agent-resume state-clearing helpers (clearAgentScopedResumeState / resetStepForAgentChange) into a focused src/lib/onboard/* module and import them from onboard.ts, or otherwise offset the growth by extracting existing onboard logic in the same PR. This is especially important because there is an active onboard extraction stack in flight (#3861, #3868, #3870, #3871, #3872, #3873, #3874, #3876).

Suggestions while touching this:

  • Add a short comment near the resume-agent-change branch explaining why agent mismatch is no longer a hard resume conflict and instead clears agent-scoped state.
  • Consider hoisting the stale built-in Brave predicate to a small helper if we expect more tests/paths to need the custom-vs-built-in distinction.

No security or E2E blockers found beyond the monolith budget failure.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Review Advisor

Recommendation: blocked
Confidence: medium
Analyzed HEAD: f1e26dbc7ed4d1855beeea89ead3257b9c2af33e
Findings: 2 blocker(s), 2 warning(s), 0 suggestion(s)

This is an automated advisory review. A human maintainer must make the final merge decision.

Limitations: Review used the provided deterministic GitHub context and supplied diff; no repository commands, package-manager commands, PR scripts, or tests were executed.; The supplied diff is marked truncated if large, so line-specific verification is limited to visible hunks and deterministic metadata.; CI and E2E results for current head f1e26db were incomplete in the trusted context.; No linked issues were present, so acceptance coverage maps PR body clauses rather than issue clauses.

Workflow run

Full advisor summary

PR Review Advisor

Base: origin/main
Head: HEAD
Analyzed SHA: f1e26dbc7ed4d1855beeea89ead3257b9c2af33e
Recommendation: blocked
Confidence: medium

Directionally aligned with the onboarding resume/policy fix, but blocked by pending CI, BLOCKED merge state/CHANGES_REQUESTED, and missing required E2E evidence for the current head SHA f1e26db.

Gate status

  • CI: pending — GraphQL rollup for head f1e26db shows multiple pending/in-progress/queued contexts, including cli-parity, E2E recommendation, wsl-e2e, PR review advisor, CodeQL, unit-vitest-linux, checks, ShellCheck SARIF, and sandbox image builds; deterministic context also reports 12 pending contexts.
  • Mergeability: fail — GitHub GraphQL reports mergeStateStatus=BLOCKED and reviewDecision=CHANGES_REQUESTED for head f1e26db.
  • Review threads: pass — 3 review thread(s), all resolved; CodeQL useless assignment and CodeRabbit extraction/model-router comments are marked resolved/addressed.
  • Risky code tested: warning — Risky onboarding/host glue paths changed and unit tests were added, but runtime sandbox/policy/model-router behavior still requires required E2E confirmation for the current head SHA.

🔴 Blockers

  • Hard gates are not satisfied for the current head SHA: The PR is not merge-ready because CI is still pending and GitHub reports a blocked merge state with CHANGES_REQUESTED for head f1e26db.
    • Recommendation: Wait for all required checks to complete successfully and for mergeStateStatus/reviewDecision to clear before considering merge.
    • Evidence: GraphQL rollup shows IN_PROGRESS/QUEUED/PENDING checks including cli-parity, E2E recommendation, wsl-e2e, PR review advisor, CodeQL, unit-vitest-linux, checks, ShellCheck SARIF, build-sandbox-images, build-sandbox-images-arm64, and CodeRabbit; mergeStateStatus=BLOCKED and reviewDecision=CHANGES_REQUESTED.
  • Required E2E evidence is missing for head f1e26db: The E2E Advisor requires onboarding resume, routed model-router inference, Brave Search, and cloud onboarding E2E coverage. Available selective E2E comments target earlier SHAs, not the current head f1e26db; the required model-router-provider-routed-inference-e2e job is not shown as passed for any current-head run.
    • Recommendation: Confirm the E2E Advisor required jobs pass for head f1e26db, including model-router-provider-routed-inference-e2e or an explicitly accepted equivalent.
    • Evidence: E2E Advisor required: onboard-resume-e2e, model-router-provider-routed-inference-e2e, brave-search-e2e, cloud-onboard-e2e. The latest passing selective E2E result targets 8639504, while current head is f1e26db.

🟡 Warnings

🔵 Suggestions

  • None.

Acceptance coverage

  • met — refresh saved provider/model/inference state when a resumed onboard switches agents, so Hermes Provider and managed-tool selections cannot leak into an explicit OpenClaw resume: src/lib/onboard.ts detects recordedAgentName !== selectedAgentName during resume, sets forceProviderSelectionForAgentChange, prevents sandbox reuse via !resumeAgentChanged, stops tracked model-router state before clearing, and calls clearAgentScopedResumeState. src/lib/onboard/agent-resume-state.ts clears provider/model/endpoint/credential/Hermes/router/policy fields and resets provider_selection/inference/sandbox/openclaw/agent_setup/policies steps. test/onboard.test.ts covers Hermes-to-OpenClaw state clearing.
  • met — keep Hermes Provider options manifest-scoped to Hermes while allowing agent changes to re-run provider selection instead of hard-failing resume: src/lib/agent/defs.test.ts asserts OpenClaw inferenceProviderOptions is [] and Hermes inferenceProviderOptions is ["hermesProvider"]. src/lib/onboard.ts removes agent mismatch from getResumeConfigConflicts and forces provider selection when resumeAgentChanged is true.
  • met — stop auto-preserving/applying the built-in Brave policy when Brave search is declined, while keeping Brave selectable and preserving custom/explicit policy choices: src/lib/onboard/policy-selection.ts adds isStaleBuiltinBravePolicyPreset, filters built-in brave when webSearchConfig is absent and no custom preset named brave exists, and skips stale built-in brave during non-interactive preservation. Tests in test/onboard-policy-suggestions.test.ts, test/onboard-preset-diff.test.ts, and test/policy-tiers-onboard.test.ts cover removal when declined and explicit custom-mode Brave retention.
  • unknown — npm run build:cli: This validation is claimed in the PR body, but trusted current-head CI is still pending; no completed build:cli result for f1e26db was provided.
  • unknown — npm test -- src/lib/agent/defs.test.ts test/hermes-provider-foundation.test.ts test/onboard-policy-suggestions.test.ts test/policy-tiers-onboard.test.ts test/onboard-preset-diff.test.ts test/onboard.test.ts: This validation is claimed in the PR body, but unit-vitest-linux is queued for the current head in trusted GraphQL context, so completed current-head test evidence is not yet available.
  • partial — Local full test-cli pre-commit/pre-push coverage was attempted twice and failed in unrelated timeout/CLI-dispatch cases outside this change surface; pushed with SKIP=test-cli after the focused suite and non-test hooks passed.: The PR body discloses skipped full test-cli coverage. Trusted current-head CI is still pending, so the focused-suite claim cannot yet substitute for completed CI/E2E gates.

Security review

  • pass — Secrets and Credentials: No hardcoded secrets or credential literals were introduced in the shown diff. The resume change clears credentialEnv/routerCredentialHash from agent-scoped session state rather than logging or exposing secret values.
  • pass — Input Validation and Data Sanitization: No new external deserialization, eval, command-string execution, or URL parsing paths were introduced. Agent names are normalized for resume-state handling, and Brave preset filtering is exact-name logic over existing policy/custom-preset metadata.
  • pass — Authentication and Authorization: No new auth endpoints or authorization checks are introduced. The change reduces cross-agent privilege/state carry-over by clearing Hermes/provider-scoped selections when the selected agent changes.
  • pass — Dependencies and Third-Party Libraries: No new production dependencies or dependency version changes are shown in the diff.
  • pass — Error Handling and Logging: New resume logging reports agent-name changes without printing credential values. stopModelRouterProcess handles already-stopped/missing processes by catching kill errors and returning.
  • pass — Cryptography and Data Protection: Not applicable — no cryptographic operations or algorithms are introduced or modified in this change.
  • warning — Configuration and Security Headers: The PR changes sandbox policy preset carry-forward/removal and sandbox reuse decisions, which are security-boundary-adjacent for network policy. Unit tests cover Brave policy selection, but required E2E proof for the current head SHA is missing.
  • warning — Security Testing: Security-relevant onboarding/policy/model-router paths have unit tests, but required E2E jobs are not shown as passed for f1e26db, and model-router-provider-routed-inference-e2e evidence is absent.
  • warning — Holistic Security Posture: The design improves least-state carry-over across agents, but it touches sandbox lifecycle, network policy presets, and model-router process cleanup. Because CI/E2E gates are incomplete for the current head, overall security posture cannot be fully confirmed.

Test / E2E status

  • Test depth: e2e_required — Runtime/sandbox/infrastructure paths need real execution coverage: src/lib/onboard.ts, src/lib/onboard/agent-resume-state.ts, src/lib/onboard/model-router-process.ts, and src/lib/onboard/policy-selection.ts. Unit tests were added for state clearing and Brave policy selection, but they cannot fully prove sandbox reuse, model-router shutdown, credential routing, and policy application behavior.
  • E2E Advisor: missing
  • Required E2E jobs: onboard-resume-e2e, model-router-provider-routed-inference-e2e, brave-search-e2e, cloud-onboard-e2e
  • Missing for analyzed SHA: onboard-resume-e2e, model-router-provider-routed-inference-e2e, brave-search-e2e, cloud-onboard-e2e

✅ What looks good

  • The PR reduces the onboarding monolith size overall: src/lib/onboard.ts is net-negative and new helper modules isolate resume-state and model-router process logic.
  • Agent-scoped resume state clearing is covered by a focused unit test that checks provider/model/Hermes/router/policy state and step rollback.
  • Brave policy behavior has suggestion and application/removal tests, including explicit custom-mode retention.
  • Prior CodeQL/CodeRabbit review threads are resolved, and the onboard-entrypoint-budget check is successful for the current head rollup.
  • The implementation stops tracked model-router state before clearing routed session state, addressing a previously identified stale-router cleanup risk.

Review completeness

  • Review used the provided deterministic GitHub context and supplied diff; no repository commands, package-manager commands, PR scripts, or tests were executed.
  • The supplied diff is marked truncated if large, so line-specific verification is limited to visible hunks and deterministic metadata.
  • CI and E2E results for current head f1e26db were incomplete in the trusted context.
  • No linked issues were present, so acceptance coverage maps PR body clauses rather than issue clauses.
  • Human maintainer review required: yes

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26172943149
Target ref: d2159316608a13653ea7e5f603d2b0be7b3eda1c
Workflow ref: main
Requested jobs: onboard-resume-e2e,cloud-onboard-e2e,hermes-e2e,brave-search-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
brave-search-e2e ✅ success
cloud-onboard-e2e ⚠️ cancelled
hermes-e2e ⚠️ cancelled
onboard-resume-e2e ⚠️ cancelled

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/lib/onboard/model-router-process.ts (2)

49-51: ⚡ Quick win

Prefix unused loop variable with underscore.

The attempt variable is only used to control iteration count and is never read. As per coding guidelines, unused variables should be prefixed with underscore following Biome conventions.

♻️ Proposed fix
-  for (let attempt = 0; attempt < 10; attempt++) {
+  for (let _attempt = 0; _attempt < 10; _attempt++) {
     await new Promise((resolve) => setTimeout(resolve, 500));
     if (!isProcessRunning(pid) && !(await isRouterHealthy(port, 1000))) return;
   }

Same for line 58:

-  for (let attempt = 0; attempt < 5; attempt++) {
+  for (let _attempt = 0; _attempt < 5; _attempt++) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/model-router-process.ts` around lines 49 - 51, The loop uses
an unused loop variable `attempt`; rename it to `_attempt` (e.g., for (let
_attempt = 0; _attempt < 10; _attempt++)) to follow the Biome convention for
unused variables, and make the same change for the second loop noted (the one
similar at line 58); no other logic changes are required—just update the loop
variable name wherever `attempt` is declared in these retry loops that call
isProcessRunning(pid) and isRouterHealthy(port, 1000).

12-12: ⚡ Quick win

Prefer top-level ESM import over inline require.

Using require("http") inside the function is inconsistent with the rest of the codebase style and the type import on line 21. Move to a top-level import for consistency.

♻️ Proposed fix
 // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 // SPDX-License-Identifier: Apache-2.0

+import http from "node:http";
 import type { Session } from "../state/onboard-session";

Then update line 12:

-  const http = require("http");

And line 20-21:

   const request = http
-      .get(`http://127.0.0.1:${port}/health`, (res: import("node:http").IncomingMessage) => {
+      .get(`http://127.0.0.1:${port}/health`, (res: http.IncomingMessage) => {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/model-router-process.ts` at line 12, Replace the inline
CommonJS usage of const http = require("http") with a top-level ESM import to
match the codebase style: remove the require and add a top-level import (e.g.
import * as http from "http") and ensure any existing type references (like
IncomingMessage/ServerResponse) continue to import from "http" as needed; update
uses of the local const http variable accordingly in model-router-process.ts so
there is no inline require left.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/onboard/model-router-process.ts`:
- Around line 49-51: The loop uses an unused loop variable `attempt`; rename it
to `_attempt` (e.g., for (let _attempt = 0; _attempt < 10; _attempt++)) to
follow the Biome convention for unused variables, and make the same change for
the second loop noted (the one similar at line 58); no other logic changes are
required—just update the loop variable name wherever `attempt` is declared in
these retry loops that call isProcessRunning(pid) and isRouterHealthy(port,
1000).
- Line 12: Replace the inline CommonJS usage of const http = require("http")
with a top-level ESM import to match the codebase style: remove the require and
add a top-level import (e.g. import * as http from "http") and ensure any
existing type references (like IncomingMessage/ServerResponse) continue to
import from "http" as needed; update uses of the local const http variable
accordingly in model-router-process.ts so there is no inline require left.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 151561f4-e716-4727-afe2-e3946a3acde8

📥 Commits

Reviewing files that changed from the base of the PR and between cb653bb and 4cd87a2.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/onboard/agent-resume-state.ts
  • src/lib/onboard/model-router-process.ts

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26173052957
Target ref: 4cd87a2e820ba4375c5d16857488a411958ee876
Workflow ref: main
Requested jobs: onboard-resume-e2e,hermes-e2e,brave-search-e2e,network-policy-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job Result
brave-search-e2e ✅ success
hermes-e2e ✅ success
network-policy-e2e ✅ success
onboard-resume-e2e ✅ success

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26174948896
Target ref: 27ca3ad49e236f980affadb98ee67b1b8a234f6f
Workflow ref: main
Requested jobs: onboard-resume-e2e,cloud-onboard-e2e,hermes-e2e,brave-search-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job Result
brave-search-e2e ✅ success
cloud-onboard-e2e ✅ success
hermes-e2e ✅ success
onboard-resume-e2e ✅ success

…pe-brave-policy

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26192033420
Target ref: 863950484d71ce9eeda7a96a2aa37e0bf2acdb1e
Workflow ref: main
Requested jobs: onboard-resume-e2e,brave-search-e2e,cloud-onboard-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
brave-search-e2e ✅ success
cloud-onboard-e2e ✅ success
onboard-resume-e2e ✅ success

…pe-brave-policy

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/onboard.ts (1)

9316-9329: Please run an onboarding E2E for the agent-change resume path.

This branch now tears down tracked router state, clears agent-scoped session data, and changes resume-time sandbox reuse/recreation semantics. A targeted onboard lifecycle E2E such as sandbox-operations-e2e or channels-stop-start-e2e would give much better coverage than unit-only validation here.

As per coding guidelines, src/lib/onboard.ts: "This file contains core onboarding logic. Changes here affect the full sandbox creation and configuration flow."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 9316 - 9329, Add an E2E test exercising the
agent-change resume path to validate the new teardown and session-clear
behavior: create or update a test (e.g., extend sandbox-operations-e2e or
channels-stop-start-e2e) that starts a sandbox with a recorded agent, triggers a
resume with a different selected agent, and asserts the tracked router teardown
via stopTrackedModelRouterForAgentChange, that onboardSession.updateSession
calls clearAgentScopedResumeState for the new selectedAgentName, and that
sandbox provider selection/recreation and sandbox reuse semantics
(formatSandboxAgentName changes) behave as expected (no residual scoped state,
provider refreshed, and resume succeeds). Ensure the E2E covers the router port
fallback (loadBlueprintProfile("routed")?.router.port || 4000) and verifies the
resumed sandbox lifecycle completes end-to-end.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 9316-9329: Add an E2E test exercising the agent-change resume path
to validate the new teardown and session-clear behavior: create or update a test
(e.g., extend sandbox-operations-e2e or channels-stop-start-e2e) that starts a
sandbox with a recorded agent, triggers a resume with a different selected
agent, and asserts the tracked router teardown via
stopTrackedModelRouterForAgentChange, that onboardSession.updateSession calls
clearAgentScopedResumeState for the new selectedAgentName, and that sandbox
provider selection/recreation and sandbox reuse semantics
(formatSandboxAgentName changes) behave as expected (no residual scoped state,
provider refreshed, and resume succeeds). Ensure the E2E covers the router port
fallback (loadBlueprintProfile("routed")?.router.port || 4000) and verifies the
resumed sandbox lifecycle completes end-to-end.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 17235573-6de8-4035-ba27-975667b7c748

📥 Commits

Reviewing files that changed from the base of the PR and between 27ca3ad and f1e26db.

📒 Files selected for processing (1)
  • src/lib/onboard.ts

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26196733519
Target ref: f1e26dbc7ed4d1855beeea89ead3257b9c2af33e
Workflow ref: main
Requested jobs: onboard-resume-e2e,brave-search-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
brave-search-e2e ✅ success
onboard-resume-e2e ✅ success

Copy link
Copy Markdown
Contributor

@jyaunches jyaunches left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the review feedback. The monolith growth blocker is resolved ( is now net-negative vs current and is green), the agent-scoped resume helpers were extracted, and the tracked model-router cleanup now runs before clearing routed session state.

I also kicked off the remaining selective regression E2E for against the current PR head :

https://github.com/NVIDIA/NemoClaw/actions/runs/26197650629

Approving from code review; please make sure that E2E finishes green before merge.

@jyaunches
Copy link
Copy Markdown
Contributor

Follow-up on my approval comment above (the shell ate the backticked text): the monolith growth blocker is resolved: src/lib/onboard.ts is now net-negative vs current main, and onboard-entrypoint-budget is green. The remaining selective regression E2E I kicked off is model-router-provider-routed-inference-e2e against current PR head f1e26dbc7ed4d1855beeea89ead3257b9c2af33e:

https://github.com/NVIDIA/NemoClaw/actions/runs/26197650629

@ericksoa ericksoa merged commit 36d8e2d into main May 21, 2026
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants