Skip to content

Nested parallelism #993

@elliottslaughter

Description

@elliottslaughter

I want to take a cuPyNumeric program (with Python Legate tasks), and run multiple copies of it, with different data, on different subsets of hardware. This might be 1 GPU each or it might be multiple GPUs or nodes (e.g., 4 GPUs, 8 GPUs).

Right now to do this I'd have to turn my entire program into a Python Legate task, throwing away most of the benefit of cuPyNumeric, and limiting the resulting execution to 1 GPU each.

We could do much better if we had inner Python tasks: i.e., a Python task that can call nested cuPyNumeric operations, or other (nested) Python tasks.

A possible API for this might look like an inner keyword on the @task declaration:

@task(inner=True, ...)
def my_inner_task(...):
    ...

From a user perspective this would work like a normal Python task, excepted that nested distribution operations are permitted inside.

For LANL/SLAC.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions