Skip to content

Fusing large operations to reduce memory usage #1237

@elliottslaughter

Description

@elliottslaughter

I have a large code I'm working on. Somewhere deep in the bowels of this code I have a line that essentially amounts to:

long_array * tall_array

Much later, 3-4 levels up the callstack, I have a function that applies a cpn.sum over one of the axes, essentially throwing away one of the (large) dimensions.

Right now, this code scales very inefficiently. I'm forced to use multiple nodes purely for memory capacity reasons, but the code actually doesn't need it: it runs far too quickly for distributed execution to make sense, and I'm essentially throwing away the compute.

The ideal solution would be to fuse the multiply and the sum to avoid the memory bloat. Doing this in user code is painful, because the code is specifically factored to be reusable. The offending multiply is multiple levels down the call stack, every one of which provides a conceptually distinct purpose, and could be called in arbitrary other code. I'm essentially breaking down the code's abstractions in order to apply this optimization manually.

If this could be done automatically with reasonable overhead that would be far more effective from a code reuse and readability perspective.

LANL/SLAC, medium priority.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions