Skip to content

Exceptions in Probe.run_command_on_host context are shadowing exceptions from command on host #611

@approxit

Description

@approxit

While firefighting with @lucekdudek goth problems in golemfactory/yapapi#1086 we found a typo, where test should encounter error when starting command on host, but somehow we were getting pattern awaiting timeouts. Pattern awaiting should not even happen, because command could not even start properly. We started searching and we have interesting finding.

This will be a complex one, buckle up!

Goth tests have this portion of code:

requestor = runner.get_probes(probe_type=RequestorProbe)[0]

async with requestor.run_command_on_host("some command") as (_cmd_task, cmd_monitor, _process_monitor):
    cmd_monitor.add_assertion(assert_no_errors)

    await cmd_monitor.wait_for_pattern("some pattern", timeout=20)

run_command_on_host() starts command on host asynchronously, it give us few bits to control that task in context manager. Let's see how context manager is constructed.

try:
    task = asyncio.create_task(cmd_task)
    yield task
    await task
except Exception:
    # do some cleaning about exceptions
    raise
finally:
    # close cmd fully

Let's assume that running cmd_task coroutine will raise some startup exception instantly. Because we're calling create_task, we're using asyncio.Task API, so to check for errors we should call task.result(), task.exception() or await that task. We're not doing this for now, and we're yielding task, giving out control from context manager "startup" to context manager's body. There, we're using .wait_for_pattern() stuff, that after some time will raise timeout exception. Raising uncaught exception in context manager body exists that context manager. So, we're back in context manager at the line with yield, but at this point we're not going further with code execution, we're raising timeout exception. This timeout exception is handled by try block, and finally our whole context manager exists with an timeout exception.

But wait, were is our task startup exception handling? Well, its inside of task, but we never checked it. Python's Task exception was never retrieved sound familiar? Well, this is the case. tasks startup exception will be retrieved only when context manager's body will exit itself without any exception. Then await task would happen, and we'll got startup exception.

Possible solutions

We have few options here, one of them is to change context manager to function decorator. In that case, cmd_task and function body could run asynchronously, and by using asyncio.wait(..., return_when=FIRST_EXCEPTION) we could control exceptions from both sides, and behave properly.

Other solution would be to check cmd_task error each time we are using any assertion-like function from goth.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions