Normalization Kernels Conflate `tile_size` and Batch Size

The layer_norm, RMS norm and softmax designs use the same parameter for the number of elements to normalize over (batch size) and the buffer sizes on the cores (tile size).

We should rename `tile_size` to `matrix_columns`, `batch_size` or something similar to avoid confusion. 

Other kernels use `tile_size` as a parameter that affects only the data movement, not the output. For those other kernels, you can tune `tile_size` to maximize L1 memory usage while still performing the same calculation. For the normalization kernels, on the other hand, changing `tile_size` changes the output.

Additionally, we might want to add support for `batch_size < tile_size`, as this should be relatively simple. (Each kernel call processes `N` batches and maintains `N` means, variances, ...). `batch_size > tile_size` might be harder to implement, as it would require passing means, variances, ... from kernel call to the next kernel call, so we could just error in that case for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization Kernels Conflate `tile_size` and Batch Size #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalization Kernels Conflate tile_size and Batch Size #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Normalization Kernels Conflate `tile_size` and Batch Size #40