Skip to content

fix(native): support Metal custom V-cache SET_ROWS#9303

Merged
lalalune merged 1 commit into
developfrom
fix/metal-v-cache-set-rows-9258
Jun 24, 2026
Merged

fix(native): support Metal custom V-cache SET_ROWS#9303
lalalune merged 1 commit into
developfrom
fix/metal-v-cache-set-rows-9258

Conversation

@lalalune

@lalalune lalalune commented Jun 24, 2026

Copy link
Copy Markdown
Member

Fixes #9258.

Summary

  • Points plugins/plugin-local-inference/native/llama.cpp at 6e83e4b9b808bc21100c7846fcc1acd0a0fa674c, which adds Metal SET_ROWS and copy/dequant support for manually selected custom V-cache tensors: tbq3_0, tbq4_0, and q4_polar.
  • Keeps the parent repo change small: the root commit updates the native llama.cpp submodule pointer and adds the human-verifiable evidence file at .github/issue-evidence/9258-metal-v-cache-set-rows.md.
  • Rebases the branch onto current origin/develop before final validation.

Root Cause

Manual custom V-cache selections could route cache updates through GGML_OP_SET_ROWS, but the Metal backend did not have destination kernels or dispatch wiring for TBQ3_0, TBQ4_0, or Q4_POLAR. With flash attention enabled, those custom cache tensors could also reach stock attention paths that need a backend-supported dequant/copy path first.

Validation

Native macOS Metal:

  • cmake --build build-metal-9258 --target test-backend-ops llama-cli llama-completion llama-server -j 12 -> passed
  • xcrun -sdk macosx metal ... ggml-metal.metal ... -> passed, warnings only
  • test-backend-ops test -b MTL0 -o SET_ROWS -p "(tbq3_0|tbq4_0|q4_polar)" -> 12/12 passed
  • test-backend-ops test -b MTL0 -o CPY -p "(tbq3_0|tbq4_0|q4_polar)" -> 6/6 passed
  • Real GGUF llama-cli smoke runs with -fa on -ctv tbq3_0, tbq4_0, and q4_polar -> all generated tokens and exited 0
  • Real GGUF llama-completion smoke runs with the same three cache types -> all exited 0

Node/web HTTP path:

  • llama-server built and served the real GGUF model on 127.0.0.1:19058
  • /completion HTTP requests for tbq3_0, tbq4_0, and q4_polar each returned JSON with tokens_predicted: 4 and exited 0

iOS / Apple-platform packaging and runtime:

  • xcrun -sdk iphoneos metal ... ggml-metal.metal ... -> passed, warnings only
  • xcrun -sdk iphonesimulator metal ... ggml-metal.metal ... -> passed, warnings only
  • ELIZA_MTP_FORCE_REBUILD=1 node packages/app-core/scripts/build-llama-cpp-mtp.mjs --target ios-arm64-metal -> passed
  • ELIZA_MTP_FORCE_REBUILD=1 node packages/app-core/scripts/build-llama-cpp-mtp.mjs --target ios-arm64-simulator-metal -> passed
  • node packages/app-core/scripts/ios-xcframework/build-xcframework.mjs --output /tmp/LlamaCpp-9258.xcframework --verify -> passed; device and simulator kernel/runtime symbol audits passed, slices ios/arm64 and ios-simulator/arm64
  • bun run --cwd packages/app build:ios:local:sim -> passed with ** BUILD SUCCEEDED **
  • Physical iPhone XCTest via run-physical-device-smoke.mjs -> passed on an iPhone 16 Pro Max; testLibElizaInferenceAbiV1CallsMatchHeader, testLlamaKernelAndVoiceSymbolsResolve, and testMetalDeviceIsAvailableOnPhysicalIos passed, optional benchmark skipped because no model was bundled

Repo/package gates:

  • bun run --cwd plugins/plugin-native-llama test -> 4 files passed, 35 tests passed
  • bun run --cwd plugins/plugin-local-inference test -> 201 files passed, 1 skipped; 2065 tests passed, 13 skipped
  • bun install -> passed
  • bun run verify -> passed, 509 successful, 509 total

Evidence:

  • .github/issue-evidence/9258-metal-v-cache-set-rows.md

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cf1fffc0-e1f9-43b5-961d-6eae9ed2fd4e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/metal-v-cache-set-rows-9258

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown
Contributor

❌ PR title does not match the required pattern. Please use one of these formats:

  • 'type: description' (e.g., 'feat: add new feature')
  • 'type(scope): description' (e.g., 'chore(core): update dependencies')
    Valid types: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert, release

@lalalune lalalune force-pushed the fix/metal-v-cache-set-rows-9258 branch from 446b92a to ff9cdd8 Compare June 24, 2026 09:46
@lalalune lalalune marked this pull request as ready for review June 24, 2026 09:48

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@lalalune lalalune changed the title [codex] fix Metal custom V-cache set rows fix(native): support Metal custom V-cache SET_ROWS Jun 24, 2026
@lalalune lalalune force-pushed the fix/metal-v-cache-set-rows-9258 branch from ff9cdd8 to 1cf1fbe Compare June 24, 2026 09:57

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@lalalune lalalune force-pushed the fix/metal-v-cache-set-rows-9258 branch from 1cf1fbe to fc6180f Compare June 24, 2026 10:06

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@lalalune lalalune merged commit f567171 into develop Jun 24, 2026
27 checks passed
@lalalune lalalune deleted the fix/metal-v-cache-set-rows-9258 branch June 24, 2026 10:09
@claude

claude Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Claude encountered an error —— View job


I'll analyze this and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal: custom V-cache (tbq3_0/tbq4_0/q4_polar) SET_ROWS aborts under manual --cache-type-v override

1 participant