-
-
Notifications
You must be signed in to change notification settings - Fork 108
fix: kill bash tool's full process tree on timeout #2482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+37
−6
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 When the grace timer force-resolves
doBash(), it never detaches from the live child:child.stdout/child.stderrkeep their'data'listeners and remain in flowing mode, anddetached:truewithout a matchingchild.unref()keeps the parent's event loop ref-counted. In the exact scenario this timer is designed for (a descendant escaped the process group viasetsidand is still holding stdio open), this leaves the per-tool-call deadlock fixed but re-introduces a per-run hang at shutdown —runApiMode/runGuideModeinorchestrate.mjscomplete naturally without callingprocess.exit, so Node will wait on the open pipes indefinitely. Fix is one line in the grace-timer body:child.stdout?.destroy(); child.stderr?.destroy(); child.unref();beforefinish(...).Extended reasoning...
What the bug is
The grace timer added at lines 219-228 force-resolves the
doBashPromise 5s after SIGKILL fires, butfinish()only resolves the Promise and clears the two timers. It never disengages from the still-live child:child.unref(), despite the spawn usingdetached: true(line 208).child.stdout.destroy()/child.stderr.destroy().'data'listeners (lines 241-242) stay attached and the streams remain in flowing mode.Why this matters
Per Node's documentation, a spawned child keeps the parent's event loop ref-counted until
'close'fires;'close'fires only when all stdio streams have closed. The graceTimer exists precisely because some descendant escaped the process group (e.g., via its ownsetsidcall) and is still holding pipe write-ends open after the SIGKILL — i.e., the very scenario where'close'will not fire naturally. Withoutdestroy()on the read-end andunref()on the child, libuv's active-handle count never drops.Why orchestrate.mjs makes this user-visible
orchestrate.mjs:66-67:Both
runApiMode(ends at line 158 withconsole.log('Usage totals:'...)) andrunGuideMode(ends at line 241 the same way) fall off the end without callingprocess.exit()on the success path. Top-level execution then attempts to drain the event loop. A leaked, ref-counted child with active piped streams keeps libuv's loop alive, sonodewill hang at end-of-run instead of exiting — re-introducing the very deadlock this PR was designed to prevent, just shifted from per-tool-call to per-process granularity.A secondary concern: the
'data'handlers are closures over the localstdout/stderrstrings. Afterfinish()returns, the orphan can keep writing and those strings keep growing unbounded —cap()is applied at result-build time only, so it does not bound the in-memory growth post-resolution.Step-by-step proof
run_bashwithtimeout_seconds: 60on a command whose tree contains a descendant that callssetsid(or is otherwise orphaned out of the bash pgrp).timerfires →killTree('SIGKILL')kills bash and any descendants still in the pgrp. The escaped descendant survives and continues to hold the pipe write-ends.'close'does NOT fire on the child, because not all stdio handles are closed.graceTimerfires →finish({ ok:false, timed_out:true, ... })resolves the Promise with snapshots ofstdout/stderrand clears both timers.runApiMode/runGuideModereturns normally.awaitresolves; the script falls off the end. Node tries to exit. It cannot: libuv still sees the live child handle (nounref) and the two open pipe handles in flowing mode (active'data'listeners, nodestroy).stdout/stderrstrings grow unboundedly until the orphan finally exits or something kills the process.Why existing code does not prevent this
The
'close'handler self-cleans, but its precondition (all stdio closed) is exactly what fails in the grace-timer scenario. The SIGKILL path doesn't help here either — by hypothesis the descendant ignored or escaped the pgrp signal.Fix
In the
graceTimercallback, before/aroundfinish(...):This forcibly closes the parent's read-ends (so libuv stops waiting on the pipes) and tells the event loop to stop ref-counting the orphaned child handle, allowing Node to exit cleanly at end-of-run. Same treatment in the SIGKILL timer body is reasonable belt-and-braces, though strictly only the grace path requires it.