Implement llama-pull tool #16423

ericcurtin · 2025-10-04T17:02:38Z

Complete llama-pull tool with documentation

ericcurtin · 2025-10-04T17:03:24Z

This is now on a forked repo, a continuation of #16132

@ngxson PTAL

tools/pull/pull.cpp

ngxson · 2025-10-05T10:00:16Z

@tommarques56 Please stop spamming PRs with low quality reviews. You must ask the contributor for consensus before doing so, otherwise it will be considered as spam.

tommarques56 · 2025-10-05T10:08:56Z

@tommarques56 Please stop spamming PRs with low quality reviews. You must ask the contributor for consensus before doing so, otherwise it will be considered as spam.

Please see the other PRs reviewed by the bot! Many pull requests fix issues and address security concerns. The bot’s quality improves every day.

ngxson · 2025-10-05T10:18:49Z

@tommarques56 I already read all of your reviews in this PR and none of them make sense. Please remove them. I don't consent to the usage of AI review in this PR.

ericcurtin · 2025-10-05T10:30:34Z

This code review bot quality does seem poor, at least in this PR, I didn't look at the others. None of the comments seemed helpful.

I did find gemini code assist and https://sourcery.ai/ useful in multiple projects, namely RamaLama and Docker Model Runner. Both projects enabled them and both teams didn't regret.

Even those code review bots sometimes produce comments that don't make sense, but comments that don't make sense can be ignored.

But it's a separate conversation to this PR I guess.

And the llama.cpp maintainers should be on board with any tools used.

tommarques56 · 2025-10-05T10:34:45Z

@tommarques56 I already read all of your reviews in this PR and none of them make sense. Please remove them. I don't consent to the usage of AI review in this PR.

This code review bot quality does seem poor, at least in this PR, I didn't look at the others. None of the comments seemed helpful.

I did find gemini code assist and https://sourcery.ai/ useful in multiple projects, namely RamaLama and Docker Model Runner. Both projects enabled them and both teams didn't regret.

Even those code review bots sometimes produce comments that don't make sense, but comments that don't make sense can be ignored.

But it's a separate conversation to this PR.

And the llama.cpp maintainers should be on board with any tools used.

Hi, sorry, I’m a real user here. I’m a PhD student and I work on improving the bot. Theoretically, the bot shouldn’t access such a large repository and should stay within my repo. So, once again, I’m sorry. However, the bot’s reviews are often really good (for example, in another PR, it detected an XSS injection failure).

Sorry again,
Tom

ngxson · 2025-10-05T10:42:18Z

Without considering about the bot's quality, if you gonna use an automation tool (whatever behind it, either a dummy algorithmic or an LLM), you must at very least:

Ask for consensus
Provide an opt-out button

I think llama.cpp will eventually update the code of conduct to include this. Usage of bots are pretty much common nowadays and often more harm than good for maintainers. Just take a look at curl for example.

ngxson

Looks good, but still I think we should consider this concern from the original PR:

I think this can be a nice tool, but one concern is that the tool is not originally asked by users. Therefore, I'm doubt if users will actually know about and use it.

Tagging core maintainers to ask if it's good to add this tool @ggerganov @slaren @danbev

ngxson · 2025-10-05T10:45:09Z

tools/pull/README.md

+## Model Storage
+
+Downloaded models are stored in the standard llama.cpp cache directory:
+- Linux/macOS: `~/.cache/llama.cpp/`


on macos it's now ~/Library/Caches/llama.cpp

btw mabe a good idea to display the cache path in the --version

If I can help, for my phd, I will be graceful

macOS change made.

How would this look? Would we just add another line here with something like cache path: the path?

$ llama-pull --version ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_device_init: GPU name: Apple M4 Max ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 28991.03 MB register_backend: registered backend Metal (1 devices) register_device: registered device Metal (Apple M4 Max) register_backend: registered backend BLAS (1 devices) register_device: registered device BLAS (Accelerate) register_backend: registered backend CPU (1 devices) register_device: registered device CPU (Apple M4 Max) version: 6690 (9e71e8997) built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0

Yes I think adding one line at the bottom is ok:

version: 6690 (9e71e8997) built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0 model cache path: ...

@ngxson Showing the cache directory in --help or --version would improve discoverability. Consider adding a --cache-dir option for flexibility.

We should also clarify where the cache directory is stored by default on Windows.

@ericcurtin The terminal output you shared shows version info and device initialization, but doesn't include a cache path. For clarity, we could add a line like 'Cache path: ~/.cache/llama.cpp' in the version output.

This would help users quickly identify where models are stored on macOS.

@ngxson Adding version info would improve traceability. However, this is better suited for the tool's output rather than the README.

Consider adding this as part of the CLI's version output instead.

@ericcurtin This follow-up discusses an external PR (#16196) unrelated to the current README.md changes. Let's keep the review focused on the documentation updates for the llama-pull tool.

It's really not clear when we are talking to a human or a bot with this user tommarques56 account 🤣

danbev · 2025-10-05T15:23:51Z

Tagging core maintainers to ask if it's good to add this tool

I was surprised to have the bot adding reviews for a PR without asking me (author) for it. It reported mostly non-useful reviews for the pull request in question and cluttering up the PR.
So I would ask for this tool not to be applied to pull requests unless the author of the PR opts-in.

ericcurtin · 2025-10-05T15:28:26Z

Tagging core maintainers to ask if it's good to add this tool

I was surprised to have the bot adding reviews for a PR without asking me (author) for it. It reported mostly non-useful reviews for the pull request in question and cluttering up the PR. So I would ask for this tool not to be applied to pull requests unless the author of the PR opts-in.

I think we are starting to conflate two things here. @ngxson was referring to llama-pull tool here, rather than AI code review tool.

danbev · 2025-10-05T16:37:46Z

@ericcurtin Sorry about that, I read through the comments a little too quickly. I'll take a closer look at llama-pull tomorrow 👍

tools/pull/pull.cpp

tommarques56 · 2025-10-06T12:49:41Z

@ericcurtin Sorry about that, I read through the comments a little too quickly. I'll take a closer look at llama-pull tomorrow 👍

@danbev Thanks for the clarification, Tom. Please do take your time to review the full context tomorrow.

For future PRs, consider double-checking bot-generated comments before raising concerns, especially when they're based on limited context.

ericcurtin · 2025-10-07T10:34:04Z

@danbev done, PTAL

tools/pull/pull.cpp

tools/pull/README.md

Complete llama-pull tool with documentation Signed-off-by: Eric Curtin <[email protected]>

ericcurtin · 2025-10-07T12:13:37Z

@danbev review comments addressed PTAL

ggerganov · 2025-10-08T07:29:02Z

Probably need to discuss the longer-term ideas in #16393 (comment) before merging this new tool as it will set a new usage pattern.

ericcurtin requested a review from ggerganov as a code owner October 4, 2025 17:02

github-actions bot added the examples label Oct 4, 2025