Skip to content

Create performance model for reduction operators#545

Open
gabeweisz wants to merge 8 commits intomainfrom
feat/gw_perf_model_for_reduction_ops
Open

Create performance model for reduction operators#545
gabeweisz wants to merge 8 commits intomainfrom
feat/gw_perf_model_for_reduction_ops

Conversation

@gabeweisz
Copy link
Collaborator

Fixes #533

The performance model is currently bare-bones, based on O(n) operations being strictly necessary for these reductions.

It could be more tuned but that would depend significantly on the implementation and the theoretical limit is something like n/2 + log(n) at a minimum which is not likely significantly more accurate

@gabeweisz gabeweisz requested a review from Copilot March 18, 2026 19:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first-pass performance model and categorization for single-GPU reduction-style ATen operators, and updates regression artifacts to reflect the new modeling in generated perf reports.

Changes:

  • Introduce Reduce / aten_reduce perf model classes to estimate FLOPs/bytes for reduction-like ops.
  • Map common aten:: reduce ops to the new perf model and categorize them under a new "Reduce" category.
  • Add a new checked-in MI300 perf report reference .xlsx used by perf-report regression tests.

Reviewed changes

Copilot reviewed 2 out of 9 changed files in this pull request and generated 3 comments.

File Description
TraceLens/PerfModel/perf_model.py Adds Reduce and aten_reduce perf model implementation (FLOPs/bytes + param parsing).
TraceLens/PerfModel/torch_op_mapping.py Registers reduce ops → aten_reduce, adds Reduce category and categorization branch.
tests/traces/mi300/Qwen_Qwen1.5-0.5B-Chat__1016005_perf_report.xlsx New reference perf report for regression testing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gabeweisz and others added 3 commits March 19, 2026 09:13
@gabeweisz gabeweisz added the perf_model Add performance model for calculating TFLOPS/s and TB/s label Mar 19, 2026
@gabeweisz gabeweisz marked this pull request as ready for review March 20, 2026 20:43
@gabeweisz gabeweisz requested a review from ajassani March 20, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf_model Add performance model for calculating TFLOPS/s and TB/s

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance Model for Reduce Kernels

2 participants