-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Implement llama-pull tool #16423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement llama-pull tool #16423
Conversation
@tommarques56 Please stop spamming PRs with low quality reviews. You must ask the contributor for consensus before doing so, otherwise it will be considered as spam. |
Please see the other PRs reviewed by the bot! Many pull requests fix issues and address security concerns. The bot’s quality improves every day. |
@tommarques56 I already read all of your reviews in this PR and none of them make sense. Please remove them. I don't consent to the usage of AI review in this PR. |
This code review bot quality does seem poor, at least in this PR, I didn't look at the others. None of the comments seemed helpful. I did find gemini code assist and https://sourcery.ai/ useful in multiple projects, namely RamaLama and Docker Model Runner. Both projects enabled them and both teams didn't regret. Even those code review bots sometimes produce comments that don't make sense, but comments that don't make sense can be ignored. But it's a separate conversation to this PR I guess. And the llama.cpp maintainers should be on board with any tools used. |
Hi, sorry, I’m a real user here. I’m a PhD student and I work on improving the bot. Theoretically, the bot shouldn’t access such a large repository and should stay within my repo. So, once again, I’m sorry. However, the bot’s reviews are often really good (for example, in another PR, it detected an XSS injection failure). Sorry again, |
Without considering about the bot's quality, if you gonna use an automation tool (whatever behind it, either a dummy algorithmic or an LLM), you must at very least:
I think llama.cpp will eventually update the code of conduct to include this. Usage of bots are pretty much common nowadays and often more harm than good for maintainers. Just take a look at curl for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but still I think we should consider this concern from the original PR:
I think this can be a nice tool, but one concern is that the tool is not originally asked by users. Therefore, I'm doubt if users will actually know about and use it.
Tagging core maintainers to ask if it's good to add this tool @ggerganov @slaren @danbev
tools/pull/README.md
Outdated
## Model Storage | ||
|
||
Downloaded models are stored in the standard llama.cpp cache directory: | ||
- Linux/macOS: `~/.cache/llama.cpp/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on macos it's now ~/Library/Caches/llama.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw mabe a good idea to display the cache path in the --version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I can help, for my phd, I will be graceful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macOS change made.
How would this look? Would we just add another line here with something like cache path: the path
?
$ llama-pull --version
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.006 sec
ggml_metal_device_init: GPU name: Apple M4 Max
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 28991.03 MB
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M4 Max)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M4 Max)
version: 6690 (9e71e8997)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think adding one line at the bottom is ok:
version: 6690 (9e71e8997)
built with Apple clang version 17.0.0 (clang-1700.0.13.5) for arm64-apple-darwin24.6.0
model cache path: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson Showing the cache directory in --help or --version would improve discoverability. Consider adding a --cache-dir option for flexibility.
We should also clarify where the cache directory is stored by default on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericcurtin The terminal output you shared shows version info and device initialization, but doesn't include a cache path. For clarity, we could add a line like 'Cache path: ~/.cache/llama.cpp' in the version output.
This would help users quickly identify where models are stored on macOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngxson Adding version info would improve traceability. However, this is better suited for the tool's output rather than the README.
Consider adding this as part of the CLI's version output instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericcurtin This follow-up discusses an external PR (#16196) unrelated to the current README.md changes. Let's keep the review focused on the documentation updates for the llama-pull
tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really not clear when we are talking to a human or a bot with this user tommarques56 account 🤣
0713710
to
65caf73
Compare
I was surprised to have the bot adding reviews for a PR without asking me (author) for it. It reported mostly non-useful reviews for the pull request in question and cluttering up the PR. |
I think we are starting to conflate two things here. @ngxson was referring to llama-pull tool here, rather than AI code review tool. |
@ericcurtin Sorry about that, I read through the comments a little too quickly. I'll take a closer look at llama-pull tomorrow 👍 |
@danbev Thanks for the clarification, Tom. Please do take your time to review the full context tomorrow. For future PRs, consider double-checking bot-generated comments before raising concerns, especially when they're based on limited context. |
@danbev done, PTAL |
Complete llama-pull tool with documentation Signed-off-by: Eric Curtin <[email protected]>
@danbev review comments addressed PTAL |
Probably need to discuss the longer-term ideas in #16393 (comment) before merging this new tool as it will set a new usage pattern. |
Complete llama-pull tool with documentation