fix(sampling): fused_topk_topp PDL race causing IMA by jaywme · Pull Request #536 · lightseekorg/tokenspeed

jaywme · 2026-06-26T19:35:43Z

Summary

Root cause: applyKernel (launched with cudaLaunchAttributeProgrammaticStreamSerialization) read top_k_idx[] before air_topk_11bits_fused_last finished writing it. Under heavy HBM contention (kvstore D2H writeback), the race window widened enough that applyKernel read the uninitialised sentinel value INT32_MAX as a token index, producing an out-of-bounds store to out_probs + 130 TB → Illegal Memory Access (IMA) crash.
Fix: Add cudaGridDependencySynchronize() at applyKernel entry so the kernel correctly waits for its PDL producer before reading top-k outputs.
pdl_enabled() toggle: Rename launchPDL → launchKernel(enable_pdl, …) and thread the global pdl_enabled() flag from Python callers through fused_topk_topp_renorm(enable_pdl=) down to every cudaLaunchKernelEx call, consistent with the rest of the codebase.
Bounds guard: Add idx < vocab_size check in the applyKernel write loop as a belt-and-suspenders guard against any future sentinel leak.

Test Plan

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28dfcc1172

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

tokenspeed/tokenspeed-kernel/python/tokenspeed_kernel/thirdparty/cuda/csrc/fused_topk_topp/fused_topk_topp.cu

Lines 611 to 612 in f5d3a6e

    
           air_top_p::launchRadixOnly<float>(toppCounters, toppHistograms, toppCountHistograms, 
        
                                            toppBuf1, toppBuf2, batchSize, vocabSize, msStream);

Thread enable_pdl into the top-p radix launch

When pdl_enabled() is false, the new flag only disables the launches in this file; this call still enters air_top_p::launchRadixOnly, whose local launcher hard-codes cudaLaunchAttributeProgrammaticStreamSerialization (air_top_p.cuh:505-516). That means --disable-pdl or non-Hopper NVIDIA runs still issue PDL launches whenever the fused path runs, so the intended fallback can still hit unsupported/disabled PDL instead of behaving like the rest of the gated kernels. Pass enable_pdl through to launchRadixOnly and use it for those cudaLaunchKernelEx calls too.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

fix pld for fused topk topp

28dfcc1

jaywme requested a review from a team as a code owner June 26, 2026 19:35

jaywme requested a review from yweng0828 June 26, 2026 19:35

Merge branch 'main' into jay/fix-pdl-for-fused-topk-topp

56e27c2

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread ...peed-kernel/python/tokenspeed_kernel/thirdparty/cuda/csrc/fused_topk_topp/fused_topk_topp.cu

jaywme and others added 2 commits June 27, 2026 02:07

update

4abe75c

Merge branch 'main' into jay/fix-pdl-for-fused-topk-topp

f5d3a6e

chatgpt-codex-connector Bot reviewed Jun 27, 2026

View reviewed changes

lightseek-bot approved these changes Jun 27, 2026

View reviewed changes

lightseek-bot merged commit 2ed8197 into main Jun 27, 2026
33 of 38 checks passed

lightseek-bot deleted the jay/fix-pdl-for-fused-topk-topp branch June 27, 2026 05:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sampling): fused_topk_topp PDL race causing IMA#536

fix(sampling): fused_topk_topp PDL race causing IMA#536
lightseek-bot merged 4 commits into
mainfrom
jay/fix-pdl-for-fused-topk-topp

jaywme commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	air_top_p::launchRadixOnly<float>(toppCounters, toppHistograms, toppCountHistograms,
	toppBuf1, toppBuf2, batchSize, vocabSize, msStream);

Uh oh!

Conversation

jaywme commented Jun 26, 2026

Summary

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants