feat: LLM driven kernel agent for testing by vgene · Pull Request #10 · aws-neuron/nkipy

vgene · 2026-02-04T09:55:03Z

Issue #, if available:

Description of changes:

Add new nkipy.tools.kernel_agent tool providing:

CLI for discovering NumPy op support status on NKI hardware
Multi-stage kernel execution (numpy, compile, hardware)
LLM-based kernel generation via AWS Bedrock
Op testing framework with dtype and shape validation
Sweep test with a given set of simple prompts

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add new nkipy.tools.kernel_agent tool providing: - CLI for discovering NumPy op support status on NKI hardware - Multi-stage kernel execution (numpy, compile, hardware) - LLM-based kernel generation via AWS Bedrock - Op testing framework with dtype and shape validation

…te functions Extract `_compile_kernel` and `_execute_neff` helper functions from `baremetal_run_traced_kernel`.

- Implement sweep functionality for continuous generation-and-test loops - Include 50+ kernel prompt templates for common operations - Add CLI commands: generate, sweep, list-prompts - Configure package data to include prompt text files - Update documentation with new commands and architecture overview

shaowz-aws · 2026-02-26T18:36:12Z

nkipy/src/nkipy/tools/kernel_agent/prompts/system_unconstrained.txt

+3. No loops or conditionals based on array values
+4. The function must be self-contained — only use numpy (imported as np)
+
+Output ONLY a JSON object such as:


Having this output and then some manual parser script may still be unstable: there might be edge cases like improperly escaped characters within the embedded source code, incorrect indentations, etc. We can consider using anthropic SDK which has bedrock and structured output support.

change the output to structured. But the numpy code is still hard to enforce. The code will be executed as numpy first in the test so if it's invalid, it will show in that stage.

shaowz-aws · 2026-02-26T18:37:22Z

nkipy/src/nkipy/tools/kernel_agent/prompts/kernel_prompts.txt

@@ -0,0 +1,62 @@
+# ML activations / norms
+softmax along the last axis


Normally, I would also include things like input/output shapes/datatypes when asking the agent to write kernels, or the LLM can produce anything that are mostly reasonable but sometimes unexpected.

I want to let the prompt as underspecified as possible so the LLM can generate "random" numpy code to test the lowering. The goal is to find corner cases that fail in NKIPy.

shaowz-aws · 2026-02-26T18:39:05Z

nkipy/src/nkipy/tools/kernel_agent/generator.py

+    input_specs = data.get("inputs", {})
+
+    inputs = {}
+    for inp_name, spec in input_specs.items():


This logic feels fragile. Maybe it's ok as a starting point.

changed to tool use to make sure no fallthrough shape and dtype

shaowz-aws · 2026-02-26T18:39:34Z

nkipy/src/nkipy/tools/kernel_agent/ops.py

+from nkipy.tools.kernel_agent.executor import ExecutionResult, run_kernel
+
+# Target operations to test
+TARGET_OPS = {


I like this classification of ops!

vgene force-pushed the feat/kernel-agent branch from 822467b to 588500e Compare February 14, 2026 08:04

vgene changed the title ~~Feature: LLM driven kernel agent for testing~~ feat: LLM driven kernel agent for testing Feb 24, 2026

vgene force-pushed the feat/kernel-agent branch from 588500e to 0aff2b0 Compare February 25, 2026 12:13

vgene added 2 commits February 26, 2026 12:04

refactor(runtime): split kernel compilation and execution into separa…

81adb72

…te functions Extract `_compile_kernel` and `_execute_neff` helper functions from `baremetal_run_traced_kernel`.

vgene marked this pull request as ready for review February 26, 2026 12:30

vgene requested a review from a team February 26, 2026 12:30

shengxu-aws approved these changes Feb 26, 2026

View reviewed changes

shaowz-aws reviewed Feb 26, 2026

View reviewed changes

vgene force-pushed the feat/kernel-agent branch 2 times, most recently from 716a1fe to 81816ae Compare February 27, 2026 13:51

feat(kernel_agent): use tool use for json format conformation

fbb62c9

vgene force-pushed the feat/kernel-agent branch from 81816ae to fbb62c9 Compare February 27, 2026 13:52

vgene merged commit c4cdc7b into main Feb 27, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM driven kernel agent for testing#10

feat: LLM driven kernel agent for testing#10
vgene merged 4 commits intomainfrom
feat/kernel-agent

vgene commented Feb 4, 2026 •

edited

Loading

Uh oh!

shaowz-aws Feb 26, 2026 •

edited

Loading

Uh oh!

vgene Feb 27, 2026

Uh oh!

shaowz-aws Feb 26, 2026 •

edited

Loading

Uh oh!

vgene Feb 27, 2026

Uh oh!

shaowz-aws Feb 26, 2026

Uh oh!

vgene Feb 27, 2026

Uh oh!

shaowz-aws Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,62 @@
		# ML activations / norms
		softmax along the last axis

Conversation

vgene commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shaowz-aws Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgene Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

shaowz-aws Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vgene Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

shaowz-aws Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

vgene Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

shaowz-aws Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vgene commented Feb 4, 2026 •

edited

Loading

shaowz-aws Feb 26, 2026 •

edited

Loading

shaowz-aws Feb 26, 2026 •

edited

Loading