While firefighting with @lucekdudek goth problems in golemfactory/yapapi#1086 we found a typo, where test should encounter error when starting command on host, but somehow we were getting pattern awaiting timeouts. Pattern awaiting should not even happen, because command could not even start properly. We started searching and we have interesting finding.
This will be a complex one, buckle up!
Goth tests have this portion of code:
requestor = runner.get_probes(probe_type=RequestorProbe)[0]
async with requestor.run_command_on_host("some command") as (_cmd_task, cmd_monitor, _process_monitor):
cmd_monitor.add_assertion(assert_no_errors)
await cmd_monitor.wait_for_pattern("some pattern", timeout=20)
run_command_on_host() starts command on host asynchronously, it give us few bits to control that task in context manager. Let's see how context manager is constructed.
try:
task = asyncio.create_task(cmd_task)
yield task
await task
except Exception:
# do some cleaning about exceptions
raise
finally:
# close cmd fully
Let's assume that running cmd_task coroutine will raise some startup exception instantly. Because we're calling create_task, we're using asyncio.Task API, so to check for errors we should call task.result(), task.exception() or await that task. We're not doing this for now, and we're yielding task, giving out control from context manager "startup" to context manager's body. There, we're using .wait_for_pattern() stuff, that after some time will raise timeout exception. Raising uncaught exception in context manager body exists that context manager. So, we're back in context manager at the line with yield, but at this point we're not going further with code execution, we're raising timeout exception. This timeout exception is handled by try block, and finally our whole context manager exists with an timeout exception.
But wait, were is our task startup exception handling? Well, its inside of task, but we never checked it. Python's Task exception was never retrieved sound familiar? Well, this is the case. tasks startup exception will be retrieved only when context manager's body will exit itself without any exception. Then await task would happen, and we'll got startup exception.
Possible solutions
We have few options here, one of them is to change context manager to function decorator. In that case, cmd_task and function body could run asynchronously, and by using asyncio.wait(..., return_when=FIRST_EXCEPTION) we could control exceptions from both sides, and behave properly.
Other solution would be to check cmd_task error each time we are using any assertion-like function from goth.
While firefighting with @lucekdudek goth problems in golemfactory/yapapi#1086 we found a typo, where test should encounter error when starting command on host, but somehow we were getting pattern awaiting timeouts. Pattern awaiting should not even happen, because command could not even start properly. We started searching and we have interesting finding.
This will be a complex one, buckle up!
Goth tests have this portion of code:
run_command_on_host()starts command on host asynchronously, it give us few bits to control that task in context manager. Let's see how context manager is constructed.Let's assume that running
cmd_taskcoroutine will raise some startup exception instantly. Because we're callingcreate_task, we're usingasyncio.TaskAPI, so to check for errors we should calltask.result(),task.exception()orawaitthat task. We're not doing this for now, and we're yieldingtask, giving out control from context manager "startup" to context manager's body. There, we're using.wait_for_pattern()stuff, that after some time will raise timeout exception. Raising uncaught exception in context manager body exists that context manager. So, we're back in context manager at the line withyield, but at this point we're not going further with code execution, we're raising timeout exception. This timeout exception is handled bytryblock, and finally our whole context manager exists with an timeout exception.But wait, were is our
taskstartup exception handling? Well, its inside oftask, but we never checked it. Python'sTask exception was never retrievedsound familiar? Well, this is the case.tasksstartup exception will be retrieved only when context manager's body will exit itself without any exception. Thenawait taskwould happen, and we'll got startup exception.Possible solutions
We have few options here, one of them is to change context manager to function decorator. In that case, cmd_task and function body could run asynchronously, and by using
asyncio.wait(..., return_when=FIRST_EXCEPTION)we could control exceptions from both sides, and behave properly.Other solution would be to check cmd_task error each time we are using any assertion-like function from goth.