bitnet-rs

Pre-alpha Rust inference engine and validation workspace for 1-bit BitNet LLMs.

Warning

Pre-alpha. Do not use in production.

BitNet-rs is not yet a general-purpose local chat engine. The project currently focuses on loader, tokenizer, kernel, receipt, and hardware validation. Coherent BitNet answer quality is still under validation, so generated BitNet text should be treated as diagnostic output.

What This Repo Is For

BitNet-rs is moving toward Rust-native BitNet inference, but the current repo is best understood as an inference-systems validation workspace. It is useful for contributors working on model loading, tokenization, quantization, kernel parity, hardware bring-up, receipts, and reproducible inference validation. It is not yet a polished end-user inference server.

What exists today:

strict GGUF loading and tokenizer metadata checks
I2_S / QK256 quantization and kernel infrastructure
scalar, AVX2, AVX-512, NEON, CUDA, OpenCL, OpenVINO, Metal, and NPU validation work
diagnostic answer-corpus and answer-parity receipts
receipts that record hardware identity, runtime identity, fallback behavior, and kernel coverage
dense SLM companion work used to validate the generation pipeline while BitNet model-artifact work continues

Current Status

The repo has real inference infrastructure, but it does not yet provide supported coherent Rust BitNet local answers. The Microsoft BitNet.cpp reference path can answer the tiny suite with the official I2_S GGUF when the missing pre-tokenizer is supplied from Microsoft's tokenizer assets.

Backend receipts remain useful for selected-device execution, tokenizer and prompt diagnostics, fallback behavior, and kernel coverage. They are not, by themselves, evidence that the Rust-generated text is a supported answer. See the model-artifact validation docs.

Capability Matrix

Area	State	What it means today
GGUF loading	Supported / hardening	Structural loading and metadata extraction are active work surfaces.
Tokenizer handling	Supported / hardening	Tokenizer metadata is checked strictly for answer-quality work.
I2_S BitNet32 CPU path	Diagnostic	CPU execution exists; coherent BitNet answer quality is still under validation.
I2_S QK256 CPU path	Diagnostic	Scalar, AVX2, and AVX-512 diagnostics have receipts; generated text quality is still under validation.
Scalar / SIMD parity	Diagnostic	Used for backend agreement checks and first-divergence debugging.
Dense SLM path	Early working	Companion/control path for generation-pipeline validation; not a BitNet quality result.
RTX 5070 Ti CUDA	Execution path validated / diagnostic	Packed BitNet CUDA has receipts through `CUDA-BITNET-009`; coherent CUDA answers and speed are not established.
Metal / OpenCL / OpenVINO / NPU	Probe / smoke	Hardware identity and narrow execution receipts exist; full BitNet answer quality is not established.
Cross-validation	Supported / hardening	Reference comparison infrastructure exists; model selection remains active work.
Honest-compute receipts	Supported	Receipts preserve backend, runtime, fallback, kernel, and timing metadata.
CLI run/chat	Diagnostic	Useful for exercising the pipeline; generated text is not yet a supported answer-quality surface.
Server / HTTP API	Incomplete	Health wiring exists; inference serving is not ready.

First Diagnostic Run

Need	Start here
First token-generation walkthrough	docs/tutorials/first-inference.md
Real GGUF model walkthrough	docs/tutorials/real-gguf-model-inference.md
Model validation workflow	docs/howto/validate-models.md
GGUF loading details	docs/howto/gguf-model-validation-and-loading.md
CLI flags and receipt options	docs/reference/inference-cli-reference.md

The commands below are a smoke path for contributors, not an answer-quality quickstart.

Build the CPU CLI:

cargo build --locked -p bitnet-cli --no-default-features --features cpu,full-cli

Download the official Microsoft BitNet GGUF:

cargo run --locked -p xtask -- download-model --id microsoft/bitnet-b1.58-2B-4T-gguf

Run a diagnostic CPU generation path:

RUST_LOG=warn cargo run --locked -p bitnet-cli \
  --no-default-features --features cpu,full-cli -- run \
  --model models/microsoft-bitnet-b1.58-2B-4T-gguf/ggml-model-i2_s.gguf \
  --tokenizer models/microsoft-bitnet-b1.58-2B-4T-gguf/tokenizer.json \
  --prompt "What is 2+2?" \
  --max-new-tokens 8 \
  --strict-loader \
  --strict-tokenizer \
  --json-out target/bitnet/receipts/first-run.json

This exercises the model, tokenizer, generation, and receipt path. Treat the output as diagnostic evidence, not as a supported chat answer.

Architecture

bitnet-tokenizers --------------------------------------+
                                                        |
bitnet-models  (GGUF loader, I2_S detection, metadata)  |
  -> bitnet-quantization  (I2_S / TL1 / TL2 / IQ2_S)    |
        -> bitnet-kernels (scalar / AVX2 / AVX-512 / NEON / CUDA)
                                                        v
                        bitnet-inference  (autoregressive engine)
                          -> bitnet-logits
                          -> bitnet-sampling
                          -> bitnet-generation
                          -> bitnet-prompt-templates
                          -> bitnet-receipts
                                                        |
                                      +-----------------+----------------+
                                      v                                  v
                                  bitnet-cli                       bitnet-server

The workspace contains roughly 200 crates. See docs/architecture-overview.md.

Hardware Validation

Hardware validation is organized by platform so backend identity, runtime identity, fallback status, and receipt coverage stay explicit.

Platform	Role
Intel 258V CPU	Lead BitNet CPU reference and AVX2 diagnostics.
i5-8250U CPU	Dense SLM CPU lead and low-power comparison.
Ryzen 9950X3D	AVX-512 support and high-performance CPU diagnostics.
RTX 5070 Ti	CUDA packed BitNet validation and future answer path.
Apple M4	Metal, MPSGraph, and CPU/NEON validation.
Arc A770	Discrete Intel GPU OpenCL/OpenVINO validation.
Arc 140V	Lunar Lake iGPU OpenCL/OpenVINO validation.
Intel NPU	OpenVINO NPU static-shape validation.

See docs/hardware/HARDWARE_MATRIX.md.

Building

cargo build --locked --no-default-features --features cpu
cargo build --locked -p bitnet-cli --no-default-features --features cpu,full-cli
cargo build --locked --no-default-features --features gpu

Optimized CPU build:

RUSTFLAGS="-C target-cpu=native -C opt-level=3 -C lto=thin" \
  cargo build --locked --release -p bitnet-cli --no-default-features --features cpu,full-cli

Feature Flags

Flag	Purpose
`cpu`	CPU inference and diagnostics.
`cuda`	CUDA backend surface.
`gpu`	GPU umbrella feature for accelerator backends currently wired through the workspace.
`full-cli`	Full CLI command set.
`ffi`	C++ FFI bridge for cross-validation.
`fixtures`	GGUF fixture-based integration tests.

Nix: nix develop && nix build .#bitnet-cli && nix flake check - see Nix guide.

Testing

cargo nextest run --locked --workspace --no-default-features --features cpu
cargo fmt --all -- --check
cargo clippy --locked --workspace --all-targets --no-default-features --features cpu -- -D warnings

The repository contains unit, property, snapshot, fixture, fuzz, BDD, receipt, and hardware-specific tests. Some tests are intentionally ignored with justification strings where hardware, model artifacts, or long-running evidence is required. See docs/development/test-suite.md.

Documentation

Section	Contents
docs/tutorials/	Getting started and first diagnostic runs.
docs/howto/	Install, run, export, validate, and cross-check.
docs/explanation/	Architecture and design notes.
docs/reference/	CLI, environment variables, quantization, and receipts.
docs/model-artifacts/	Model artifact status and validation.
docs/hardware/	Hardware validation and benchmark protocol.
docs/tracking/	Campaign state and active work.

What We Are Working On

Near-term work is focused on:

matching the Microsoft BitNet.cpp reference path from Rust CPU
preserving the reference runner, tokenizer, pre-tokenizer, and prompt template chain
enriching backend-neutral answer diagnostics and first-divergence receipts
validating coherent BitNet answer quality against a deterministic corpus
validating strict CPU/CUDA answer parity after the Rust CPU path works
qualifying throughput after answer quality works

Contributing

See CONTRIBUTING.md. Before opening a PR:

./ci/local.sh

New internal maintenance commands belong in xtask. bitnet-task exists only to preserve legacy scripts/*.sh entrypoints while that migration is in flight.

See ROADMAP.md for project direction.

License

Dual-licensed under MIT and Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3,675 Commits
.agent/receipts		.agent/receipts
.cargo		.cargo
.claude		.claude
.codex		.codex
.config		.config
.copilot/notes		.copilot/notes
.githooks		.githooks
.github		.github
.jules		.jules
.kiro/specs		.kiro/specs
archive		archive
assets		assets
baselines		baselines
benches		benches
benchmarks/baselines/pr-448		benchmarks/baselines/pr-448
bin		bin
ci		ci
config		config
crates		crates
crossval		crossval
docker		docker
docs		docs
examples		examples
fuzz		fuzz
include		include
infra		infra
media		media
models		models
patches		patches
policy		policy
scripts		scripts
src		src
tests-new		tests-new
tests		tests
tools		tools
xtask-build-helper		xtask-build-helper
xtask		xtask
.coderabbit.yaml		.coderabbit.yaml
.crates.toml		.crates.toml
.crates2.json		.crates2.json
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.lychee.toml		.lychee.toml
.markdownlint.jsonc		.markdownlint.jsonc
.pre-commit-config.yaml		.pre-commit-config.yaml
.tokeignore		.tokeignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPATIBILITY.md		COMPATIBILITY.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE		LICENSE
Makefile		Makefile
Makefile.ci		Makefile.ci
Makefile.minimal		Makefile.minimal
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
THIRD_PARTY.md		THIRD_PARTY.md
build.rs		build.rs
clippy.toml		clippy.toml
codecov.yml		codecov.yml
deny.toml		deny.toml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
flake.lock		flake.lock
flake.nix		flake.nix
mutants.toml		mutants.toml
ripr.toml		ripr.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
taplo.toml		taplo.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bitnet-rs

What This Repo Is For

Current Status

Capability Matrix

First Diagnostic Run

Architecture

Hardware Validation

Building

Feature Flags

Testing

Documentation

What We Are Working On

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bitnet-rs

What This Repo Is For

Current Status

Capability Matrix

First Diagnostic Run

Architecture

Hardware Validation

Building

Feature Flags

Testing

Documentation

What We Are Working On

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages