Replies: 2 comments 2 replies
-
|
Note that whisper.cpp had ggml-org/whisper.cpp#398 and ggml-org/whisper.cpp#2816 successfully merged to have big-endian support with a similar theory of implementation. |
Beta Was this translation helpful? Give feedback.
-
If you intend on running the same test on your system, you should point the test to the Big-Endian test model.
We have an existing tool (gguf_convert_endian.py) that uses
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This relates to #3552 and #3957.
I want to start a broader discussion about whether and how to support endianness (byte order) differences between some host platforms and models, and to push for a coherent and consistent path forward, to accomplish these objectives:
While the codebase as of at least 2376b77 will compile on big-endian targets (e.g. ppc64), it will quickly encounter errors while loading little-endian .gguf models, as early as the built-in test suite:
This assertion (garbage file version) fails due to a check which assumes file version byte order:
The on-disk start of the file is:
If the proposal to use
GGUFmagic for LE files andFUGGmagic for BE files is to be implemented:Transmitting and storing one copy of each would require twice the disk space while providing no new information.
Conversion tools will require maintenance as the format evolves, and will require robust validation to ensure they work flawlessly.
Robust file format specifications will need to be laid out, including specifying whether all, or only a subset of, fields/values/parameters will need to be interpreted as LE or BE. I cannot speak for performance about model loading times or whether conversion should happen on the fly.
If we patch the GGUF files in place such that:
then it will be difficult to ensure such a conversion is done correctly, as the conversion tool(s) themselves will also need to be endian-correct whether they're used on an LE or BE host. This will not prevent cross-endian GGUF files from failing to run on the opposite host, and likely many tools will choke as well.
I believe it is a software (inference engine) issue, not strictly a file format issue, to ensure that data are encoded and interpreted correctly. The file format (data exchange vehicle) should probably be consistent (effectively serialized) regardless of the byte order of the system that produced the GGUF. In other words, host byte order should not affect the data written to disk. The file format should specify how the data are to be interpreted.
Similarly, llama.cpp should behave the same (modulo any performance penalties) regardless of the input file. Suppose we fix the loader so that it is endian-agnostic, now other such assumptions are revealed:
While BE systems are not nearly as common or popular as LE systems, they do exist. My proposal is to fix all byte order assumptions in the llama.cpp codebase but I want to know if this topic has been explored in depth and whether/where to put my and others' effort.
Beta Was this translation helpful? Give feedback.
All reactions