Skip to content

Fix broken leaky ReLU operator (#36)#102

Open
albiol2004 wants to merge 1 commit intoamd:develfrom
albiol2004:fix-leaky-relu
Open

Fix broken leaky ReLU operator (#36)#102
albiol2004 wants to merge 1 commit intoamd:develfrom
albiol2004:fix-leaky-relu

Conversation

@albiol2004
Copy link
Copy Markdown

Added

  • aie_kernels/aie2/leaky_relu.cc : AIE2 kernel for Leaky ReLU (vector width 16, matching the AIE2P version's structure with aie:: API).

Changed

  • iron/operators/leaky_relu/design.py : Fixed kernel parameter type: np.dtype[xfr_dtype]xfr_dtype. The previous form produced a GenericAlias that MLIR couldn't convert to a function type, causing TypeError during compilation.
  • iron/operators/leaky_relu/test.py : Removed @pytest.mark.skip decorator.
  • README.md : Updated Leaky RELU row: removed "(WIP)", added NPU1 support checkmark, changed status to green.

Removed

Nothing.

Test results

  • 31/31 Leaky ReLU tests pass (input lengths 1K–8K, 1–8 columns, 1–2 channels, alpha=0.01)
  • No regressions in ReLU (8/8) or SiLU (4/4)

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR has been reviewed and approved.
  3. All checks are passing.

Closes #36

  Two bugs:
  1. design.py passed np.dtype[bfloat16] (a GenericAlias) as the kernel's
     alpha parameter type, but MLIR expects a bare type. Fixed by changing
     np.dtype[xfr_dtype] to xfr_dtype.
  2. Missing AIE2 kernel, only the AIE2P version existed. Added
     aie_kernels/aie2/leaky_relu.cc (vector width 16).

  Also removed pytest.mark.skip from test and updated README.md to
  reflect working status.
@github-actions
Copy link
Copy Markdown
Contributor

📊 Test Results for Test Example Applications

709cb99 (2026_04_10_17_04_11)

IRONCLAD

Tested on 2026_04_10_17_04_11 at commit 709cb99.

Test Checks TTFT (mean)TPS (mean)
📈 Trends (vs main branch) for Test Example Applications

709cb99 (2026_04_10_17_04_11)

IRONCLAD Trends

llama_3.2_1b

Commit/Date Num Tokens (max)Num Tokens (mean)Num Tokens (median)Num Tokens (min)Num Tokens (stddev)TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)Total (max)Total (mean)Total (median)Total (min)Total (stddev)
130b6ea — 2025-12-05 21:33:1240.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)0.00 (n/a)4.71 (-0.42%)4.64 (-0.09%)4.64 (+0.65%)4.55 (-0.22%)0.05 (-17.66%)4.41 (-0.34%)4.39 (-0.19%)4.38 (-0.33%)4.37 (-0.15%)0.01 (-25.90%)12.96 (-0.00%)12.80 (+0.07%)12.80 (-0.23%)12.67 (+0.44%)0.09 (-21.12%)
0a6c11c — 2025-12-03 23:35:1540.00 (n/a)40.00 (n/a)40.00 (n/a)40.00 (n/a)0.00 (n/a)4.73 (n/a)4.64 (n/a)4.61 (n/a)4.56 (n/a)0.06 (n/a)4.42 (n/a)4.40 (n/a)4.40 (n/a)4.37 (n/a)0.02 (n/a)12.96 (n/a)12.79 (n/a)12.83 (n/a)12.62 (n/a)0.12 (n/a)

llama_3.2_1b_prompt_1024_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
912e6bc — 2026-04-07 19:08:432.15 (-0.46%)2.13 (-0.47%)2.13 (-0.19%)2.11 (-0.85%)0.02 (+39.39%)
2371174 — 2026-04-06 17:38:482.16 (n/a)2.14 (n/a)2.14 (n/a)2.12 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_1024_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
912e6bc — 2026-04-07 19:08:434.21 (+0.86%)4.17 (+0.13%)4.16 (-0.05%)4.14 (-0.34%)0.03 (+220.64%)2.28 (+0.71%)2.16 (-0.29%)2.13 (-0.84%)2.12 (+0.19%)0.07 (+18.56%)
2371174 — 2026-04-06 17:38:484.17 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.27 (n/a)2.16 (n/a)2.15 (n/a)2.11 (n/a)0.06 (n/a)

llama_3.2_1b_prompt_13_tokens_1

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
912e6bc — 2026-04-07 19:08:432.10 (+0.29%)2.09 (+0.31%)2.09 (+0.43%)2.09 (+0.58%)0.01 (-40.31%)
2371174 — 2026-04-06 17:38:482.10 (n/a)2.08 (n/a)2.08 (n/a)2.07 (n/a)0.01 (n/a)

llama_3.2_1b_prompt_13_tokens_40

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
912e6bc — 2026-04-07 19:08:434.18 (+0.12%)4.16 (+0.13%)4.16 (+0.14%)4.15 (+0.00%)0.01 (+12.35%)2.10 (-0.76%)2.09 (-0.22%)2.09 (-0.10%)2.07 (-0.48%)0.01 (-20.82%)
2371174 — 2026-04-06 17:38:484.18 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.12 (n/a)2.09 (n/a)2.09 (n/a)2.08 (n/a)0.02 (n/a)

llama_3.2_1b_prompt_2048_tokens_1

Commit/Date Num_Tokens (max)Num_Tokens (mean)Num_Tokens (median)Num_Tokens (min)Num_Tokens (stddev)TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
897d04e — 2026-03-06 22:56:071.00 (+0.00%)1.00 (+0.00%)1.00 (+0.00%)1.00 (+0.00%)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)2.68 (-1.06%)2.68 (-1.06%)2.68 (-1.06%)2.68 (-1.06%)0.00 (n/a)
84d3478 — 2026-02-17 23:16:231.00 (n/a)1.00 (n/a)1.00 (n/a)1.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)2.70 (n/a)2.70 (n/a)2.70 (n/a)2.70 (n/a)0.00 (n/a)

llama_3.2_1b_prompt_2048_tokens_40

Commit/Date Num_Tokens (max)Num_Tokens (mean)Num_Tokens (median)Num_Tokens (min)Num_Tokens (stddev)TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
897d04e — 2026-03-06 22:56:0740.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)40.00 (+0.00%)0.00 (n/a)4.00 (-1.72%)4.00 (-1.72%)4.00 (-1.72%)4.00 (-1.72%)0.00 (n/a)2.70 (-0.44%)2.70 (-0.44%)2.70 (-0.44%)2.70 (-0.44%)0.00 (n/a)
84d3478 — 2026-02-17 23:16:2340.00 (n/a)40.00 (n/a)40.00 (n/a)40.00 (n/a)0.00 (n/a)4.07 (n/a)4.07 (n/a)4.07 (n/a)4.07 (n/a)0.00 (n/a)2.71 (n/a)2.71 (n/a)2.71 (n/a)2.71 (n/a)0.00 (n/a)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix or remove broken leaky ReLU kernel

1 participant