-
Notifications
You must be signed in to change notification settings - Fork 63
Description
I want to take a cuPyNumeric program (with Python Legate tasks), and run multiple copies of it, with different data, on different subsets of hardware. This might be 1 GPU each or it might be multiple GPUs or nodes (e.g., 4 GPUs, 8 GPUs).
Right now to do this I'd have to turn my entire program into a Python Legate task, throwing away most of the benefit of cuPyNumeric, and limiting the resulting execution to 1 GPU each.
We could do much better if we had inner Python tasks: i.e., a Python task that can call nested cuPyNumeric operations, or other (nested) Python tasks.
A possible API for this might look like an inner keyword on the @task declaration:
@task(inner=True, ...)
def my_inner_task(...):
...From a user perspective this would work like a normal Python task, excepted that nested distribution operations are permitted inside.
For LANL/SLAC.