Port mamba2 kernels and runtime from sglang#03c77dc by netanel-haber · Pull Request #412 · lightseekorg/tokenspeed

netanel-haber · 2026-06-10T12:58:58Z

Groundwork for NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 support.

This ports the required Mamba2 Triton kernels, mixer, and metadata classes from SGLang for the follow-up NemotronH architecture PRs.

lightseek-bot

Please use components available in tokenspeed, such as some modules from Qwen 3.5, to support this, rather than adapting from others.

Add the TokenSpeed Mamba2 runtime wrappers and Triton SSD kernels needed by hybrid Mamba2 models, with the local decode selective-state-update implementation omitted in favor of FlashInfer's maintained flashinfer.mamba.selective_state_update. Provenance: - Source repo: https://github.com/sgl-project/sglang - Source commit: 03c77dc33d0a051aa15c1235407440d9d107b98f - Source files adapted from SGLang: - python/sglang/srt/layers/attention/mamba/mamba.py - python/sglang/srt/layers/attention/mamba/mamba2_metadata.py - python/sglang/srt/layers/attention/mamba/mixer2_rms_norm_gated.py - python/sglang/srt/layers/attention/mamba/ops/ssd_bmm.py - python/sglang/srt/layers/attention/mamba/ops/ssd_chunk_scan.py - python/sglang/srt/layers/attention/mamba/ops/ssd_chunk_state.py - python/sglang/srt/layers/attention/mamba/ops/ssd_combined.py - python/sglang/srt/layers/attention/mamba/ops/ssd_state_passing.py TokenSpeed adaptations: - Use TokenSpeed Mapping, tensor-parallel helpers, linear layers, and weight loader hooks in the runtime mixer. - Import Triton through tokenspeed_kernel._triton in the copied SSD kernels. - Keep SGLang/vLLM/source-state comments in the copied kernel files. - Use FlashInfer for selective_state_update instead of carrying SGLang's local mamba_ssm.py SSU implementation. Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0079f0df82

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T21:58:55Z

+
+import torch
+import torch.nn as nn
+from flashinfer.mamba import selective_state_update


Keep FlashInfer behind tokenspeed-kernel

In deployments where flashinfer-python is unavailable or unsupported (for example non-NVIDIA/runtime-only environments), importing this Mamba2 layer now raises before backend selection can fall back. The repo guidance says runtime code should use tokenspeed-kernel as the kernel boundary and keep third-party kernel libraries there, so this update path should be exposed through tokenspeed-kernel or optionalized instead of importing FlashInfer directly here.

Useful? React with 👍 / 👎.

github-actions · 2026-06-26T00:29:43Z

This PR has been inactive for 14 days and is marked as stale. It will be closed in 3 days if there is no further activity.

netanel-haber changed the title ~~port mamba2 kernels from sglang#03c77dc~~ Port mamba2 kernels from sglang#03c77dc Jun 10, 2026

netanel-haber changed the title ~~Port mamba2 kernels from sglang#03c77dc~~ Port mamba2 kernels and runtime from sglang#03c77dc Jun 10, 2026

lightseek-bot requested changes Jun 10, 2026

View reviewed changes

netanel-haber force-pushed the feature/mamba2-triton-kernels branch 2 times, most recently from d06d03e to 402ef26 Compare June 11, 2026 13:32

netanel-haber force-pushed the feature/mamba2-triton-kernels branch from 402ef26 to 79ace81 Compare June 11, 2026 14:35

Merge branch 'main' into feature/mamba2-triton-kernels

0079f0d

netanel-haber marked this pull request as ready for review June 11, 2026 21:56

netanel-haber requested a review from a team as a code owner June 11, 2026 21:56

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

github-actions Bot added the inactive label Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port mamba2 kernels and runtime from sglang#03c77dc#412

Port mamba2 kernels and runtime from sglang#03c77dc#412
netanel-haber wants to merge 2 commits into
lightseekorg:mainfrom
netanel-haber:feature/mamba2-triton-kernels

netanel-haber commented Jun 10, 2026 •

edited

Loading

Uh oh!

lightseek-bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

netanel-haber commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lightseek-bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netanel-haber commented Jun 10, 2026 •

edited

Loading