Fix check_ssl_connection() function#866
Conversation
|
There sill maybe an issue in |
| for _ in range(10): | ||
| code = dp.poll() |
There was a problem hiding this comment.
I may use also dp.wait(timeout) instead of dp.pool() and a for loop.
There was a problem hiding this comment.
Yes, please. It is a little more clear.
There was a problem hiding this comment.
Ok, I will do dp.wait(timeout) then.
| @@ -202,7 +202,15 @@ def check_ssl_connection(host, port, silent=False): | |||
| expose_output = not silent | |||
| with detached_subprocess(ssl_check_cmd, expose_output=expose_output) as dp: | |||
| time.sleep(1) | |||
There was a problem hiding this comment.
This sleep is not needed anymore.
| for _ in range(10): | ||
| code = dp.poll() |
There was a problem hiding this comment.
Yes, please. It is a little more clear.
| raise SSLHealthcheckError('OpenSSL connection verification failed') | ||
| timeout = 20 | ||
| try: | ||
| dp.wait(timeout=timeout) |
There was a problem hiding this comment.
This consistently times out :/
Based on https://docs.python.org/3/library/subprocess.html#subprocess.Popen.wait I suspect:
Note This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.
There was a problem hiding this comment.
Reading now https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
communicate() returns a tuple (stdout_data, stderr_data).
and looking how def detached_subprocess() is defined in core/ssl/utils.py, that it is p.stdout.read() and grabs the output, I am inclined to actually lave the loop and dp.poll() as it was.
There was a problem hiding this comment.
With dp.wait() it consistently throws subprocess.TimeoutExpired and openssl s_client -connect 127.0.0.1:4536 -verify_return_error -verify 2 is not finishing, but it is running.
There was a problem hiding this comment.
Hm.. it times out for me now with dp.poll() as well, but it worked before 🤔
There was a problem hiding this comment.
Ok, I think there were two problems and another fix was stdin=subprocess.DEVNULL in utils.py
Function doesn't wait for `openssl s_client ...` to finish. It assumes that when the command is still running that is the successful condition. However the function should wait for exit code from the binary. We saw in production intermittent and very often `skale ssl upload` failures. This change should fix this problem and underlying race condition.
Replace for loop and dp.poll() with more straightforward dp.wait() with a timeout, as requested during diff review.
Redirect the child's standard input to subprocess.DEVNULL, so it starts with no stdin attached. This prevents the OpenSSL health-check process from reading from, or blocking on, the parent's terminal or execution environment stdin stream.
|
With all 3 commits, command |
|
Tested my pull req interactively and via systemd unit file - both work fine. |
Function doesn't wait for
openssl s_client ...to finish. It assumes that when the command is still running that is the successful condition. However the function should wait for exit code from the binary. We saw in production intermittent and very oftenskale ssl uploadfailures. This change should fix this problem and underlying race condition.Related to: