Skip to content

Releases: CodeLinaro/llama.cpp

b4255

04 Dec 00:00
cc98896

Choose a tag to compare

vulkan: optimize and reenable split_k (#10637)

Use vector loads when possible in mul_mat_split_k_reduce. Use split_k
when there aren't enough workgroups to fill the shaders.

b4242

03 Dec 00:51
642330a

Choose a tag to compare

llama : add enum for built-in chat templates (#10623)

* llama : add enum for supported chat templates

* use "built-in" instead of "supported"

* arg: print list of built-in templates

* fix test

* update server README

b4226

30 Nov 00:13
7cc2d2c

Choose a tag to compare

ggml : move AMX to the CPU backend (#10570)

* ggml : move AMX to the CPU backend

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b4224

29 Nov 19:42
3a8e9af

Choose a tag to compare

imatrix : support combine-only (#10492)

* imatrix-combine-only idea

* ensured that behavior consistent with log

b4215

28 Nov 20:28
dc22344

Choose a tag to compare

ggml : remove redundant copyright notice + update authors

b4202

28 Nov 00:22
9f91251

Choose a tag to compare

common : fix duplicated file name with hf_repo and hf_file (#10550)

b4191

26 Nov 23:28
c9b00a7

Choose a tag to compare

ci : fix cuda releases (#10532)

b4174

26 Nov 04:04
0eb4e12

Choose a tag to compare

vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484)

The vulkan-shaders-gen was not parsing the --no-clean argument correctly.
Because the previous code was parsing the arguments which have a value only
and the --no-clean argument does not have a value, it was not being parsed
correctly. This commit can now correctly parse arguments that don't have values.

b4173

25 Nov 23:57
0cc6375

Choose a tag to compare

Introduce llama-run (#10291)

It's like simple-chat but it uses smart pointers to avoid manual
memory cleanups. Less memory leaks in the code now. Avoid printing
multiple dots. Split code into smaller functions. Uses no exception
handling.

Signed-off-by: Eric Curtin <[email protected]>

b4170

25 Nov 21:25
47f931c

Choose a tag to compare

server : enable cache_prompt by default (#10501)

ggml-ci