Skip to content

fix(handshake): add missing vllm_version field to EngineCoreReadyResponse payload#459

Open
iamagenius00 wants to merge 1 commit into
GradientHQ:mainfrom
iamagenius00:fix/engine-core-ready-vllm-version
Open

fix(handshake): add missing vllm_version field to EngineCoreReadyResponse payload#459
iamagenius00 wants to merge 1 commit into
GradientHQ:mainfrom
iamagenius00:fix/engine-core-ready-vllm-version

Conversation

@iamagenius00
Copy link
Copy Markdown

Summary

The Rust vllm-rs frontend added in #457 expects 5 fields in the EngineCoreReadyResponse handshake payload, but engine_core_ready_payload in Python only emits 4 — missing vllm_version. As a result, any Mac/MLX worker crashes at handshake time with:

Error: failed to connect to engine core
Caused by:
    messagepack decode failed for vllm_engine_core_client::protocol::handshake::EngineCoreReadyResponse:
    missing field `vllm_version`;
    value fallback: {"max_model_len": 7168, "num_gpu_blocks": 0, "dp_stats_address": nil, "dtype": "bfloat16"}

This blocks all inference (/v1/chat/completions, /v1/completions, etc. all return 500) — the HTTP layer never gets a chance to route requests because the frontend ↔ engine IPC never establishes.

Root cause

The shipped vllm-rs binary's strings show:

struct EngineCoreReadyResponse with 5 elements
  max_model_len, num_gpu_blocks, dtype, vllm_version, dp_stats_address

But src/parallax/server/engine_core_protocol.py::engine_core_ready_payload only includes 4 of those, omitting vllm_version. The Python error fallback dump ({max_model_len, num_gpu_blocks, dp_stats_address, dtype}) confirms this exact set.

Fix

Add vllm_version as a keyword-only arg with default "0.0.0+parallax", and include it in the dict. The Rust side appears to only check field presence (no version comparison logic visible in the binary), so any string suffices; the "0.0.0+parallax" marker makes it clear in debug logs that this came from the Python MLX path rather than upstream vLLM. Callers can override if needed.

 def engine_core_ready_payload(
     *,
     max_model_len: int,
     dtype: Optional[str],
     num_gpu_blocks: int = 0,
     dp_stats_address: Optional[str] = None,
+    vllm_version: str = "0.0.0+parallax",
 ) -> bytes:
     """Build the registration payload sent by the engine DEALER socket."""
     return _pack_msgpack(
         {
             "max_model_len": int(max_model_len),
             "num_gpu_blocks": int(num_gpu_blocks),
             "dp_stats_address": dp_stats_address,
             "dtype": dtype,
+            "vllm_version": vllm_version,
         }
     )

Test plan

Single-node mac mini (M4 Pro, MLX backend), HEAD 328c99f with this patch applied:

  1. parallax run -m Qwen/Qwen3-0.6B -n 1 (scheduler tab)
  2. parallax join (worker tab)

Before patch:

  • Worker logs missing field vllm_version and exits with code 1 shortly after model load
  • curl localhost:3001/v1/chat/completionsinternal server error (500)

After patch:

  • Worker logs bootstrapped engines connected engine_count=1
  • Worker logs starting OpenAI server bind_address=[::1]:3000 model=Qwen/Qwen3-0.6B
  • curl localhost:3001/v1/chat/completions and curl localhost:3000/v1/chat/completions both return valid OpenAI-shaped JSON with model output:
    {"id":"chatcmpl-94ef2c77-...","model":"Qwen/Qwen3-0.6B",
     "choices":[{"index":0,"message":{"role":"assistant","reasoning":"..."}}],
     "usage":{"prompt_tokens":17,"total_tokens":67}}

Notes

  • This affects every Mac/MLX user following the README quickstart on current main — it's not a configuration issue.
  • Default "0.0.0+parallax" is semver-compatible (build-metadata suffix) and grep-able in logs.
  • Happy to adjust the default string or to source the value from parallax.__version__ if maintainers prefer.

…onse

The Rust vllm-rs frontend introduced in GradientHQ#457 expects a 5-field
EngineCoreReadyResponse (max_model_len, num_gpu_blocks, dtype,
vllm_version, dp_stats_address), but engine_core_ready_payload in
Python only sends 4 fields, omitting vllm_version.

This causes msgpack decode to fail at handshake time on any
Mac/MLX worker, blocking all inference (e.g. /v1/chat/completions
returns 500). Verified by inspecting the vllm-rs binary strings,
which contain "struct EngineCoreReadyResponse with 5 elements"
followed by the field list ending with vllm_version.

Fix: add vllm_version as a keyword arg with a default of
"0.0.0+parallax" so the field is always emitted; the value can
be overridden by callers if needed. The Rust side does not
validate the version string content, only its presence.

Tested on a single-node mac mini (M4 Pro) cluster: after patch,
"bootstrapped engines connected engine_count=1" appears in the
worker log, and /v1/chat/completions returns a valid 200 response
with model output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant