Fusing large operations to reduce memory usage

I have a large code I'm working on. Somewhere deep in the bowels of this code I have a line that essentially amounts to:

```python
long_array * tall_array
```

Much later, 3-4 levels up the callstack, I have a function that applies a `cpn.sum` over one of the axes, essentially throwing away one of the (large) dimensions.

Right now, this code scales very inefficiently. I'm forced to use multiple nodes purely for memory capacity reasons, but the code actually doesn't need it: it runs far too quickly for distributed execution to make sense, and I'm essentially throwing away the compute.

The ideal solution would be to fuse the multiply and the sum to avoid the memory bloat. Doing this in user code is painful, because the code is specifically factored to be reusable. The offending multiply is multiple levels down the call stack, every one of which provides a conceptually distinct purpose, and could be called in arbitrary other code. I'm essentially breaking down the code's abstractions in order to apply this optimization manually.

If this could be done automatically with reasonable overhead that would be far more effective from a code reuse and readability perspective.

LANL/SLAC, medium priority.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fusing large operations to reduce memory usage #1237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fusing large operations to reduce memory usage #1237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions