Aurashk/improve process manager terminate behaviour #733

Aurashk · 2025-12-05T15:24:26Z

Description

Fixes #658

Modifies both flavours of SSH process manager (shell and paramiko) to query/kill remote processes directly rather than through the local ssh client. This is achieved by running headless remote processes and storing the pid of the remote process in metadata. Then the pid can be used to send signals through ssh directly to the remote process.

This has some desired effects:

We get informative exit codes from the remote processes rather than the ssh client processes
More control of cleanup through sending signals directly, otherwise all we can do is kill the client which sends a SIGHUP to the remote process

Less desirable effects:

Some commands in drunc-unified-shell are slower e.g. on my laptop ps now takes a couple of seconds rather than instant. Killing each process is around a second. I would expect some degree of slowdown as we went from completely local process communication to having to talk remotely through ssh to each process. The implementation can probably be optimised better if this is a concern.

Note: There is an adjacent comment in the issue about fixing the terminate order to match K8s. I think that would be straightforward to add to this PR but I will wait for feedback on the approach first

Type of change

Documentation (non-breaking change that adds or improves the documentation)
New feature (non-breaking change which adds functionality)
Optimization (non-breaking, back-end change that speeds up the code)
Bug fix (non-breaking change which fixes an issue)
Breaking change (whatever its nature)

Key checklist

All tests pass (eg. python -m pytest)
Pre-commit hooks run successfully (eg. pre-commit run --all-files)

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added or an issue has been opened to tackle that in the future.
(Indicate issue here: # (issue))

improve kill handling to kill remote processes directly via ssh via sigterm

…-behaviour

…signal behaviour

jamesturner246

Not sure if you can still see our meeting chat any more, but I think I have signal forwarding working without any monitor threads or other hacks.

The secret is to run the ssh client with the -t option, which forces remote to create a PTY, and then run the command inside of that, instead of without, where it runs the command directly. Advantage of this way is signals like SIGTERM et al are actually forwarded to remote command properly, thanks to the PTY's built-in signal handling.

So all it means in practice is using ssh -t ... instead of ssh ..., and it should work without the hacks.

jamesturner246 · 2025-12-08T11:05:40Z

One caveat though, SIGTERMing the local ssh means that in fact a SIGHUP is actually what appears on the remote command side. Butt from this use case I don't think it matters which of SIGTERM or SIGHUP reaches remote command, just the fact that a TERM-ish signal is reaching remote reliably.

jamesturner246 · 2025-12-08T11:13:05Z

One can in fact be EXTRA safe (probably would recommend) by sending the ^C byte through explicitly, and letting the remote program deal with shutting itself down. The ssh return code would be the return code of the remote command. It would fall back to getting SIGHUP if that fails though.

Aurashk · 2026-01-16T11:06:00Z

One caveat though, SIGTERMing the local ssh means that in fact a SIGHUP is actually what appears on the remote command side. Butt from this use case I don't think it matters which of SIGTERM or SIGHUP reaches remote command, just the fact that a TERM-ish signal is reaching remote reliably.

From my understanding of #658 and #649 this is exactly the problem we want to solve. We want to explicitly send signals to the remote process, but the current implementation send signals to the local process. If a remote process gets a SIGHUP it leads to ambiguity, as it just means the connection/terminal was closed from the client side. The current intended behaviour attempts to shut everything down cleanly with a SIQUIT (but this doesn't work because it SIGQUITS the client instead)

This is all my interpretation of what the code should be doing from issues and looking at the current implementation though, I'd say ideally we should have these nuances documented somewhere it's clear what boot and kill should do. Maybe add a few lines in process_manager.md

Aurashk · 2026-01-16T11:15:13Z

Not sure if you can still see our meeting chat any more, but I think I have signal forwarding working without any monitor threads or other hacks.

The secret is to run the ssh client with the -t option, which forces remote to create a PTY, and then run the command inside of that, instead of without, where it runs the command directly. Advantage of this way is signals like SIGTERM et al are actually forwarded to remote command properly, thanks to the PTY's built-in signal handling.

So all it means in practice is using ssh -t ... instead of ssh ..., and it should work without the hacks.

Monitor threads are used for a few things in the shell ssh process manager which we should probably document better (reading logs from remote processes asynchronously, checking the remote process is alive, running a callback function when the remote process exits), I don't think it's that straightforward to remove them.

We are already using -t in all ssh commands. I actually think we might be overusing it and should use -T for some of the ssh calls for better efficiency. I'm not too sure why -t is already used everywhere - it may be that some of the processes require a full interactive shell so the process manager wouldn't work robustly without it.

Aurashk added 5 commits December 4, 2025 17:26

add support for saving metadata ssh process manager

95367c9

improve kill handling to kill remote processes directly via ssh via sigterm

improve boot efficiency

1423994

Merge branch 'develop' into aurashk/improve-process-manager-terminate…

ce2e65c

…-behaviour

make shell process manager a background remote process to fix remote …

a97a724

…signal behaviour

add tmp/ fall back for ssh process metadata

d465e76

Aurashk requested review from PawelPlesniak and jamesturner246 December 5, 2025 16:44

Aurashk marked this pull request as ready for review December 8, 2025 10:10

jamesturner246 requested changes Dec 8, 2025

View reviewed changes

improve commenting and naming

ed3ea5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aurashk/improve process manager terminate behaviour #733

Aurashk/improve process manager terminate behaviour #733

Uh oh!

Aurashk commented Dec 5, 2025 •

edited

Loading

Uh oh!

jamesturner246 left a comment

Uh oh!

jamesturner246 commented Dec 8, 2025

Uh oh!

jamesturner246 commented Dec 8, 2025 •

edited

Loading

Uh oh!

Aurashk commented Jan 16, 2026

Uh oh!

Aurashk commented Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aurashk/improve process manager terminate behaviour #733

Are you sure you want to change the base?

Aurashk/improve process manager terminate behaviour #733

Uh oh!

Conversation

Aurashk commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist

Further checks

Uh oh!

jamesturner246 left a comment

Choose a reason for hiding this comment

Uh oh!

jamesturner246 commented Dec 8, 2025

Uh oh!

jamesturner246 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aurashk commented Jan 16, 2026

Uh oh!

Aurashk commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Aurashk commented Dec 5, 2025 •

edited

Loading

jamesturner246 commented Dec 8, 2025 •

edited

Loading

Aurashk commented Jan 16, 2026 •

edited

Loading