fix(handshake): add missing vllm_version field to EngineCoreReadyResponse payload by iamagenius00 · Pull Request #459 · GradientHQ/parallax

iamagenius00 · 2026-05-29T08:57:58Z

Summary

The Rust vllm-rs frontend added in #457 expects 5 fields in the EngineCoreReadyResponse handshake payload, but engine_core_ready_payload in Python only emits 4 — missing vllm_version. As a result, any Mac/MLX worker crashes at handshake time with:

Error: failed to connect to engine core
Caused by:
    messagepack decode failed for vllm_engine_core_client::protocol::handshake::EngineCoreReadyResponse:
    missing field `vllm_version`;
    value fallback: {"max_model_len": 7168, "num_gpu_blocks": 0, "dp_stats_address": nil, "dtype": "bfloat16"}

This blocks all inference (/v1/chat/completions, /v1/completions, etc. all return 500) — the HTTP layer never gets a chance to route requests because the frontend ↔ engine IPC never establishes.

Root cause

The shipped vllm-rs binary's strings show:

struct EngineCoreReadyResponse with 5 elements
  max_model_len, num_gpu_blocks, dtype, vllm_version, dp_stats_address

But src/parallax/server/engine_core_protocol.py::engine_core_ready_payload only includes 4 of those, omitting vllm_version. The Python error fallback dump ({max_model_len, num_gpu_blocks, dp_stats_address, dtype}) confirms this exact set.

Fix

Add vllm_version as a keyword-only arg with default "0.0.0+parallax", and include it in the dict. The Rust side appears to only check field presence (no version comparison logic visible in the binary), so any string suffices; the "0.0.0+parallax" marker makes it clear in debug logs that this came from the Python MLX path rather than upstream vLLM. Callers can override if needed.

 def engine_core_ready_payload(
     *,
     max_model_len: int,
     dtype: Optional[str],
     num_gpu_blocks: int = 0,
     dp_stats_address: Optional[str] = None,
+    vllm_version: str = "0.0.0+parallax",
 ) -> bytes:
     """Build the registration payload sent by the engine DEALER socket."""
     return _pack_msgpack(
         {
             "max_model_len": int(max_model_len),
             "num_gpu_blocks": int(num_gpu_blocks),
             "dp_stats_address": dp_stats_address,
             "dtype": dtype,
+            "vllm_version": vllm_version,
         }
     )

Test plan

Single-node mac mini (M4 Pro, MLX backend), HEAD 328c99f with this patch applied:

parallax run -m Qwen/Qwen3-0.6B -n 1 (scheduler tab)
parallax join (worker tab)

Before patch:

Worker logs missing field vllm_version and exits with code 1 shortly after model load
curl localhost:3001/v1/chat/completions → internal server error (500)

After patch:

Worker logs bootstrapped engines connected engine_count=1
Worker logs starting OpenAI server bind_address=[::1]:3000 model=Qwen/Qwen3-0.6B

curl localhost:3001/v1/chat/completions and curl localhost:3000/v1/chat/completions both return valid OpenAI-shaped JSON with model output:

{"id":"chatcmpl-94ef2c77-...","model":"Qwen/Qwen3-0.6B",
 "choices":[{"index":0,"message":{"role":"assistant","reasoning":"..."}}],
 "usage":{"prompt_tokens":17,"total_tokens":67}}

Notes

This affects every Mac/MLX user following the README quickstart on current main — it's not a configuration issue.
Default "0.0.0+parallax" is semver-compatible (build-metadata suffix) and grep-able in logs.
Happy to adjust the default string or to source the value from parallax.__version__ if maintainers prefer.

…onse The Rust vllm-rs frontend introduced in GradientHQ#457 expects a 5-field EngineCoreReadyResponse (max_model_len, num_gpu_blocks, dtype, vllm_version, dp_stats_address), but engine_core_ready_payload in Python only sends 4 fields, omitting vllm_version. This causes msgpack decode to fail at handshake time on any Mac/MLX worker, blocking all inference (e.g. /v1/chat/completions returns 500). Verified by inspecting the vllm-rs binary strings, which contain "struct EngineCoreReadyResponse with 5 elements" followed by the field list ending with vllm_version. Fix: add vllm_version as a keyword arg with a default of "0.0.0+parallax" so the field is always emitted; the value can be overridden by callers if needed. The Rust side does not validate the version string content, only its presence. Tested on a single-node mac mini (M4 Pro) cluster: after patch, "bootstrapped engines connected engine_count=1" appears in the worker log, and /v1/chat/completions returns a valid 200 response with model output.

iamagenius00 requested a review from a team May 29, 2026 08:57

iamagenius00 mentioned this pull request May 29, 2026

[macOS] Single-host scheduler + worker setup fails due to libp2p mDNS unreliability; propose --local-only mode #460

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(handshake): add missing vllm_version field to EngineCoreReadyResponse payload#459

fix(handshake): add missing vllm_version field to EngineCoreReadyResponse payload#459
iamagenius00 wants to merge 1 commit into
GradientHQ:mainfrom
iamagenius00:fix/engine-core-ready-vllm-version

iamagenius00 commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iamagenius00 commented May 29, 2026

Summary

Root cause

Fix

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant