Skip to content

Conversation

ericcurtin
Copy link
Collaborator

Complete llama-pull tool with documentation

@ericcurtin
Copy link
Collaborator Author

This is now on a forked repo, a continuation of #16132

@ngxson PTAL

@ngxson
Copy link
Collaborator

ngxson commented Oct 5, 2025

@tommarques56 Please stop spamming PRs with low quality reviews. You must ask the contributor for consensus before doing so, otherwise it will be considered as spam.

@tommarques56
Copy link

tommarques56 commented Oct 5, 2025

@tommarques56 Please stop spamming PRs with low quality reviews. You must ask the contributor for consensus before doing so, otherwise it will be considered as spam.

Please see the other PRs reviewed by the bot! Many pull requests fix issues and address security concerns. The bot’s quality improves every day.

@ngxson
Copy link
Collaborator

ngxson commented Oct 5, 2025

@tommarques56 I already read all of your reviews in this PR and none of them make sense. Please remove them. I don't consent to the usage of AI review in this PR.

@ericcurtin
Copy link
Collaborator Author

ericcurtin commented Oct 5, 2025

This code review bot quality does seem poor, at least in this PR, I didn't look at the others. None of the comments seemed helpful.

I did find gemini code assist and https://sourcery.ai/ useful in multiple projects, namely RamaLama and Docker Model Runner. Both projects enabled them and both teams didn't regret.

Even those code review bots sometimes produce comments that don't make sense, but comments that don't make sense can be ignored.

But it's a separate conversation to this PR I guess.

And the llama.cpp maintainers should be on board with any tools used.

@tommarques56
Copy link

@tommarques56 I already read all of your reviews in this PR and none of them make sense. Please remove them. I don't consent to the usage of AI review in this PR.

This code review bot quality does seem poor, at least in this PR, I didn't look at the others. None of the comments seemed helpful.

I did find gemini code assist and https://sourcery.ai/ useful in multiple projects, namely RamaLama and Docker Model Runner. Both projects enabled them and both teams didn't regret.

Even those code review bots sometimes produce comments that don't make sense, but comments that don't make sense can be ignored.

But it's a separate conversation to this PR.

And the llama.cpp maintainers should be on board with any tools used.

Hi, sorry, I’m a real user here. I’m a PhD student and I work on improving the bot. Theoretically, the bot shouldn’t access such a large repository and should stay within my repo. So, once again, I’m sorry. However, the bot’s reviews are often really good (for example, in another PR, it detected an XSS injection failure).

Sorry again,
Tom

@ngxson
Copy link
Collaborator

ngxson commented Oct 5, 2025

Without considering about the bot's quality, if you gonna use an automation tool (whatever behind it, either a dummy algorithmic or an LLM), you must at very least:

  • Ask for consensus
  • Provide an opt-out button

I think llama.cpp will eventually update the code of conduct to include this. Usage of bots are pretty much common nowadays and often more harm than good for maintainers. Just take a look at curl for example.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but still I think we should consider this concern from the original PR:

I think this can be a nice tool, but one concern is that the tool is not originally asked by users. Therefore, I'm doubt if users will actually know about and use it.

Tagging core maintainers to ask if it's good to add this tool @ggerganov @slaren @danbev

## Model Storage

Downloaded models are stored in the standard llama.cpp cache directory:
- Linux/macOS: `~/.cache/llama.cpp/`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on macos it's now ~/Library/Caches/llama.cpp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw mabe a good idea to display the cache path in the --version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I can help, for my phd, I will be graceful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS change made.

How would this look? Would we just add another line here with something like cache path: the path?

$ llama-pull --version
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name:   Apple M4 Max
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9  (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 28991.03 MB
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M4 Max)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M4 Max)
version: 6690 (9e71e8997)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think adding one line at the bottom is ok:

version: 6690 (9e71e8997)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0
model cache path: ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson Showing the cache directory in --help or --version would improve discoverability. Consider adding a --cache-dir option for flexibility.

We should also clarify where the cache directory is stored by default on Windows.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericcurtin The terminal output you shared shows version info and device initialization, but doesn't include a cache path. For clarity, we could add a line like 'Cache path: ~/.cache/llama.cpp' in the version output.

This would help users quickly identify where models are stored on macOS.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson Adding version info would improve traceability. However, this is better suited for the tool's output rather than the README.

Consider adding this as part of the CLI's version output instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericcurtin This follow-up discusses an external PR (#16196) unrelated to the current README.md changes. Let's keep the review focused on the documentation updates for the llama-pull tool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really not clear when we are talking to a human or a bot with this user tommarques56 account 🤣

@ericcurtin ericcurtin force-pushed the llama-pull branch 2 times, most recently from 0713710 to 65caf73 Compare October 5, 2025 13:40
@danbev
Copy link
Member

danbev commented Oct 5, 2025

Tagging core maintainers to ask if it's good to add this tool

I was surprised to have the bot adding reviews for a PR without asking me (author) for it. It reported mostly non-useful reviews for the pull request in question and cluttering up the PR.
So I would ask for this tool not to be applied to pull requests unless the author of the PR opts-in.

@ericcurtin
Copy link
Collaborator Author

Tagging core maintainers to ask if it's good to add this tool

I was surprised to have the bot adding reviews for a PR without asking me (author) for it. It reported mostly non-useful reviews for the pull request in question and cluttering up the PR. So I would ask for this tool not to be applied to pull requests unless the author of the PR opts-in.

I think we are starting to conflate two things here. @ngxson was referring to llama-pull tool here, rather than AI code review tool.

@danbev
Copy link
Member

danbev commented Oct 5, 2025

@ericcurtin Sorry about that, I read through the comments a little too quickly. I'll take a closer look at llama-pull tomorrow 👍

@tommarques56
Copy link

@ericcurtin Sorry about that, I read through the comments a little too quickly. I'll take a closer look at llama-pull tomorrow 👍

@danbev Thanks for the clarification, Tom. Please do take your time to review the full context tomorrow.

For future PRs, consider double-checking bot-generated comments before raising concerns, especially when they're based on limited context.

@ericcurtin
Copy link
Collaborator Author

@danbev done, PTAL

Complete llama-pull tool with documentation

Signed-off-by: Eric Curtin <[email protected]>
@ericcurtin
Copy link
Collaborator Author

@danbev review comments addressed PTAL

@ggerganov ggerganov requested review from ngxson and removed request for ggerganov October 8, 2025 07:27
@ggerganov
Copy link
Member

Probably need to discuss the longer-term ideas in #16393 (comment) before merging this new tool as it will set a new usage pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants