-
Notifications
You must be signed in to change notification settings - Fork 5
Aurashk/improve process manager terminate behaviour #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Aurashk/improve process manager terminate behaviour #733
Conversation
improve kill handling to kill remote processes directly via ssh via sigterm
jamesturner246
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you can still see our meeting chat any more, but I think I have signal forwarding working without any monitor threads or other hacks.
The secret is to run the ssh client with the -t option, which forces remote to create a PTY, and then run the command inside of that, instead of without, where it runs the command directly. Advantage of this way is signals like SIGTERM et al are actually forwarded to remote command properly, thanks to the PTY's built-in signal handling.
So all it means in practice is using ssh -t ... instead of ssh ..., and it should work without the hacks.
|
One caveat though, SIGTERMing the local ssh means that in fact a SIGHUP is actually what appears on the remote command side. Butt from this use case I don't think it matters which of SIGTERM or SIGHUP reaches remote command, just the fact that a TERM-ish signal is reaching remote reliably. |
|
One can in fact be EXTRA safe (probably would recommend) by sending the |
From my understanding of #658 and #649 this is exactly the problem we want to solve. We want to explicitly send signals to the remote process, but the current implementation send signals to the local process. If a remote process gets a SIGHUP it leads to ambiguity, as it just means the connection/terminal was closed from the client side. The current intended behaviour attempts to shut everything down cleanly with a This is all my interpretation of what the code should be doing from issues and looking at the current implementation though, I'd say ideally we should have these nuances documented somewhere it's clear what |
Monitor threads are used for a few things in the shell ssh process manager which we should probably document better (reading logs from remote processes asynchronously, checking the remote process is alive, running a callback function when the remote process exits), I don't think it's that straightforward to remove them. We are already using |
Description
Fixes #658
Modifies both flavours of SSH process manager (shell and paramiko) to query/kill remote processes directly rather than through the local ssh client. This is achieved by running headless remote processes and storing the pid of the remote process in metadata. Then the pid can be used to send signals through ssh directly to the remote process.
This has some desired effects:
SIGHUPto the remote processLess desirable effects:
drunc-unified-shellare slower e.g. on my laptoppsnow takes a couple of seconds rather than instant. Killing each process is around a second. I would expect some degree of slowdown as we went from completely local process communication to having to talk remotely through ssh to each process. The implementation can probably be optimised better if this is a concern.Note: There is an adjacent comment in the issue about fixing the terminate order to match K8s. I think that would be straightforward to add to this PR but I will wait for feedback on the approach first
Type of change
Key checklist
python -m pytest)pre-commit run --all-files)Further checks
(Indicate issue here: # (issue))