Unblock WaitForExit when a grandchild inherits redirected pipes#1223
Unblock WaitForExit when a grandchild inherits redirected pipes#1223LukeButters wants to merge 4 commits into
Conversation
SilentProcessRunner.ExecuteCommand could hang indefinitely on cancellation when the launched process spawned a grandchild that inherited the redirected stdout/stderr handles and outlived the immediate child (observed in a real dump: PowerShell -> AutoReport spawning a child and exiting). Process.Kill(true) walks descendants by ParentProcessId — once the intermediate parent dies, the grandchild is no longer reachable from the walk, so the pipes stay open and the parameterless WaitForExit() blocks forever waiting on the async readers to drain. On cancellation, after the kill, close the Process's redirected streams so the in-flight ReadAsync calls throw and the AsyncStreamReader signals EOF, letting WaitForExit return. Gated on asyncReadsStarted so the early-cancel path (before BeginOutputReadLine) is unaffected. Adds a Windows integration test that reproduces the hang via a PowerShell -> cmd -> ping chain where cmd exits after `start /b` returns, breaking the PPID chain and bypassing Kill(true)'s descendant walk. The test guards against a true infinite hang by wrapping the ExecuteCommand call in Task.Wait(30s). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| process.BeginOutputReadLine(); | ||
| process.BeginErrorReadLine(); | ||
| Volatile.Write(ref asyncReadsStarted, true); |
There was a problem hiding this comment.
The AI says:
● asyncReadsStarted is written by one thread (the thread running ExecuteCommand) and read by another (whoever fires cancel.Cancel(), which invokes the Register callback synchronously on the cancelling thread). It's a textbook cross-thread bool flag.
Without any synchronisation:
- JIT can hoist the read. The compiler could cache the captured local in a register and not re-fetch it. Less likely in a one-shot lambda, but not guaranteed.
- CPU can reorder. On weak memory models (ARM — the project's RIDs include linux-arm64/linux-arm), a plain read might not observe a prior write from another core without a memory fence.
Volatile.Write does a release fence after writing true; Volatile.Read does an acquire fence before reading. That pairs to give you the happens-before guarantee you need: if the writer set it before the cancel fired, the callback is guaranteed to see true.
Why not other options:
- volatile keyword — only allowed on fields, not locals. asyncReadsStarted is a captured local, so it's lifted into a compiler-generated closure class field that you can't decorate.
- Interlocked — doesn't support bool. You'd have to widen to int, which is uglier for no real benefit here.
- lock — overkill for a one-way flag.
Honest caveat: in this specific case it's belt-and-braces. CancellationTokenSource.Cancel() internally uses interlocked operations that act as a full fence, so if Thread A's write happens before Cancel(), other threads observing the cancelled state will also see the write
transitively. So a plain bool would probably work in practice. But Volatile.Read/Write makes the cross-thread intent explicit, is cheap, and is the correct portable answer that doesn't rely on knowing CTS internals — worth the two lines.
| // descendants by ParentProcessId and finds NOTHING (cmd's PID is gone, so | ||
| // ping no longer appears as a descendant). Without the fix that closes | ||
| // StandardOutput/StandardError after Kill, WaitForExit() hangs forever. | ||
| var psScript = @" |
There was a problem hiding this comment.
yep AI generated.
The important thing is that the test failed before hand, which the AI tells me it did :D
There was a problem hiding this comment.
Pull request overview
Fixes a hang in SilentProcessRunner.ExecuteCommand where, on cancellation, a grandchild process that inherited the redirected stdout/stderr pipes and survived Process.Kill(true) (because the PPID chain was broken when the intermediate parent already exited) would keep the pipe write-ends open. The parameterless WaitForExit() blocks waiting for the async readers to drain, so cancellation never completes. The fix closes the parent Process on the cancellation callback so EOF propagates to the async readers and WaitForExit returns.
Changes:
- In the cancellation callback, after
DoOurBestToCleanUp, callprocess.Close()to release the redirected pipe handles so async readers see EOF andWaitForExit()can return. - Gate the new
process.Close()on anasyncReadsStartedflag (set withVolatile.WriteafterBeginOutputReadLine/BeginErrorReadLine) so the early-cancel path is unchanged. - Add a Windows-only integration test reproducing the hang via a PowerShell → cmd → ping chain (cmd exits after
start /b, orphaning ping with the inherited pipes), wrapped in a 30sTask.Waitto fail rather than hang.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| source/Octopus.Tentacle.Core/Util/CommandLine/SilentProcessRunner.cs | Closes the process on cancel (after async reads are wired up) to unblock WaitForExit when a grandchild holds the redirected pipes. |
| source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs | New Windows integration test reproducing the grandchild/redirected-pipe hang scenario, with PID-file based orchestration and cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Reproduces a dump scenario: a child process spawns a grandchild that | ||
| // inherits the redirected stdout/stderr handles AND is detached from the | ||
| // .NET-managed job (CREATE_BREAKAWAY_FROM_JOB). When cancellation fires: | ||
| // - Hitman.Kill terminates the immediate child. | ||
| // - The grandchild survives and keeps the write-end of the pipes open. | ||
| // - Process.WaitForExit() blocks forever waiting for the readers to EOF. | ||
| // The fix is to close StandardOutput/StandardError after Kill so the | ||
| // readers see EOF on our end and WaitForExit returns. |
[Retry(3)] was cargo-culted from a neighbouring cancellation test; the new test's assertion is deterministic so retries only hide real regressions. Bump the grandchild-spawn deadline from 30s to 60s to cover slow CI boxes (AV scanning newly-launched cmd.exe/ping.exe, Roslyn Add-Type compile, WMI lookup) — a setup timeout now means something is actually broken. Use Task.Delay instead of Thread.Sleep so the test does not block a thread-pool thread while waiting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The header comment was left over from an earlier iteration that tried to detach the grandchild via CREATE_BREAKAWAY_FROM_JOB. The script only passes CREATE_NO_WINDOW (0x08000000); what actually makes Kill(true) miss the grandchild is the PPID chain breaking when cmd exits after `start /b` returns. The comment lower down already describes this correctly, so drop the stale header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Ensure you make a public issue on this one |
Replace the embedded Add-Type C# (STARTUPINFO, PROCESS_INFORMATION, CreateProcessW, GetStdHandle, SetHandleInformation, CloseHandle plus the StringBuilder commandline quote-juggling) with a plain System.Diagnostics.Process.Start. .NET's Process.Start sets bInheritHandles=true and passes parent-inherited handles via GetStdHandle for any std stream not explicitly redirected -- as long as at least one IS redirected. Redirecting stdin (which we don't use) flips the switch, so cmd inherits PowerShell's stdout/stderr (which came from SilentProcessRunner's pipes) for free. Verified the simplified script still hangs the unfixed code at the 30s assertion deadline, and passes in ~1s with the fix in place. -84 LOC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
xwipeoutx
left a comment
There was a problem hiding this comment.
I feel like we're solving the wrong problem... the control flow itself is kinda borked, right? There's chances for stale bools all over it.
Like, should DoOurBestToCleanUp call Close() after killing it? It'll cause the BeginOutputReadLine to throw but... who cares? We're dead anyway?
I think I'd need to dive in further to actually understand it
| // active ReadAsync calls never see EOF and the call | ||
| // hangs forever. Only valid once BeginOutputReadLine | ||
| // has wired up the AsyncStreamReaders. | ||
| if (Volatile.Read(ref asyncReadsStarted)) |
There was a problem hiding this comment.
[formatting] Can you improve the formatting here? Jamming a whole try/catch onto a single line to avoid a brace is ... a decision.
| @@ -128,17 +128,30 @@ void WriteData(Action<string> action, ManualResetEventSlim resetEvent, DataRecei | |||
| process.Start(); | |||
|
|
|||
| var running = true; | |||
There was a problem hiding this comment.
I ran this through Claude and I think running has similar problems... can running get hoisted and cause issues?
Honestly I'm struggling to follow along here, and this is still in draft so don't want to spend much time on it.
| if (cancel.IsCancellationRequested) | ||
| DoOurBestToCleanUp(process, error); | ||
|
|
||
| process.BeginOutputReadLine(); |
There was a problem hiding this comment.
There's still race conditions here - we can cancel while we're beginning still. Again dunno if this is a problem... I think this reduces the window, but I'm not convinced the volatile gets us much.
Fixes: EFT-3354
SilentProcessRunner.ExecuteCommand could hang indefinitely on cancellation when the launched process spawned a grandchild that inherited the redirected stdout/stderr handles and outlived the immediate child (observed in a real dump: PowerShell -> AutoReport spawning a child and exiting). Process.Kill(true) walks descendants by ParentProcessId — once the intermediate parent dies, the grandchild is no longer reachable from the walk, so the pipes stay open and the parameterless WaitForExit() blocks forever waiting on the async readers to drain.
On cancellation, after the kill, close the Process's redirected streams so the in-flight ReadAsync calls throw and the AsyncStreamReader signals EOF, letting WaitForExit return. Gated on asyncReadsStarted so the early-cancel path (before BeginOutputReadLine) is unaffected.
Adds a Windows integration test that reproduces the hang via a PowerShell -> cmd -> ping chain where cmd exits after
start /breturns, breaking the PPID chain and bypassing Kill(true)'s descendant walk. The test guards against a true infinite hang by wrapping the ExecuteCommand call in Task.Wait(30s).Are you a customer of Octopus Deploy? Please contact our support team so we can triage your PR, so that we can make sure it's handled appropriately.
Background
Results
Fixes https://github.com/OctopusDeploy/OctopusTentacle/issues/... (optional public issue)
See How we use GitHub Issues (including this flowchart
Before
After
How to review this PR
Quality ✔️
Pre-requisites