Add Kimi K2.5 generation vLLM runtime and K2.6/K2.7-Code models by Juno13340 · Pull Request #633 · ome-projects/ome

Juno13340 · 2026-06-22T22:00:15Z

What this PR does

Adds OME config for the Kimi K2.5-generation (KimiK25ForConditionalGeneration) models:

A single shared vLLM ClusterServingRuntime vllm-kimi-k25-single-node-8gpu (single node, 8 GPU, TP=8) that both models auto-select.
Two ClusterBaseModel entries: kimi-k2-6 (moonshotai/Kimi-K2.6) and kimi-k2-7-code (moonshotai/Kimi-K2.7-Code).
Registers all three in the respective kustomization.yaml.

Why we need it

Enables serving the new Kimi K2.6 and K2.7-Code models on OME. Both share an identical vLLM serving config (same architecture, size range, args, images), so they are served by one generic runtime via autoSelect rather than duplicating per-model runtimes — consistent with how the Qwen3 guard/base runtimes were consolidated. The k25 naming keeps it distinct from the existing original-K2 runtimes, which use DeepseekV3ForCausalLM.

Fixes #

How to test

N/A — config-only change. Validated with kubectl kustomize config/runtimes and kubectl kustomize config/models (both build cleanly). After applying, a Kimi K2.6/K2.7-Code model auto-selects vllm-kimi-k25-single-node-8gpu without naming it in the InferenceService.

Checklist

Tests added/updated (if applicable) — N/A (config only)
Docs updated (if applicable) — N/A
kubectl kustomize builds locally for both config/runtimes and config/models

YouNeedCryDear · 2026-06-23T00:43:58Z

+        #   tensorParallelismOverride:
+        #     tensorParallelSize: 8
+  modelSizeRange:
+    min: 150B


How is this size calculated? Isn't it a 1T parameter model?

1T logically, yes, but OME matches on the safetensors element count, which ignores dtype. These are int4-packed, so it comes out ~150–300B, not 1T. Tried 900–1100B first and autoSelect failed for this exact reason.

…equests, use /health startup probe, remove dead comments

…purpose multimodal model convention

Juno13340 requested review from CatherineSue, XinyueZhang369 and slin1237 as code owners June 22, 2026 22:00

github-actions Bot added runtime Runtime configuration changes models Model configuration changes config Configuration changes labels Jun 22, 2026

Add Kimi K2.5 generation vLLM runtime and K2.6/K2.7-Code models

8935008

Juno13340 force-pushed the genhuang/kimi-k2-6-k2-7-runtimes branch from ebb9ed0 to 8935008 Compare June 22, 2026 22:06

YouNeedCryDear requested changes Jun 23, 2026

View reviewed changes

Juno13340 added 3 commits June 23, 2026 13:06

Address review: drop unused fit-quantization annotation, add router r…

7edbe69

…equests, use /health startup probe, remove dead comments

Add TEXT_TO_TEXT capability to Kimi K2.6/K2.7-Code, matching general-…

421852c

…purpose multimodal model convention

Rename Kimi K2.5 runtime to vllm-kimi-k25-tp8

b4a2ad6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kimi K2.5 generation vLLM runtime and K2.6/K2.7-Code models#633

Add Kimi K2.5 generation vLLM runtime and K2.6/K2.7-Code models#633
Juno13340 wants to merge 4 commits into
ome-projects:mainfrom
Juno13340:genhuang/kimi-k2-6-k2-7-runtimes

Juno13340 commented Jun 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear Jun 23, 2026

Uh oh!

Juno13340 Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Juno13340 commented Jun 22, 2026

What this PR does

Why we need it

How to test

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Juno13340 Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants