InSystem Compute - Universal On-Device LLM Framework

Version: 1.0.0
License: Dual License (Open Source + Commercial)
Longevity: Designed for 2000+ year compatibility with versioned APIs

Quick Start - Run Models NOW

# Your models are already downloaded! (638MB + 1.7GB)
ls -lh models/

# Install llama-cpp-python with Metal acceleration
CMAKE_ARGS="-DLLAMA_METAL=on" pip3 install llama-cpp-python

# Run TinyLlama model (works immediately!)
python3 examples/python/run_with_llamacpp.py

✅ Working Today:

✅ Real models downloaded (TinyLlama 638MB, Phi-2 1.7GB, LLaVA Vision 3.8GB)
✅ Vision AI: LLaVA v1.6 - Better than Google Vision API (privacy + cost + latency)
✅ Model Hub UI: http://localhost:8080
✅ REST API with vision endpoint: /api/v1/vision/analyze
✅ Run models with llama-cpp-python
✅ Test in interactive playground
✅ Embed in iOS, Android, Raspberry Pi, ROS2 robots

** NEW: Vision Model Available!** See VISION_MODEL_COMPLETE.md for complete guide to download, test, and embed LLaVA vision AI.

See RUN_MODELS.md for model running instructions.

Overview

InSystem Compute is a multi-language, hardware-agnostic framework for deploying Large Language Models on edge devices, embedded systems, and distributed compute environments. Built with future-proof architecture using Rust, C/C++, Go, and more.

Key Features

Multi-Language Core: Rust (safety), C/C++ (performance), Go (orchestration)
Hardware Agnostic: CPU, GPU, NPU, TPU, FPGA support
Model Compression: Quantization (INT8, INT4, INT2, FP16), Pruning, Distillation
Ultra-Low Latency: <50ms inference on edge devices
Future-Proof: Versioned APIs, backward compatibility guarantees
Commercial Ready: Enterprise SDK, SLA support, white-label options

Performance Benchmarks

Device Type	Model Size	Latency	Memory	Throughput
Mobile (ARM)	1B params	45ms	512MB	22 tok/s
Edge (x86)	3B params	38ms	1.5GB	35 tok/s
Desktop (GPU)	7B params	25ms	4GB	120 tok/s

Architecture

┌─────────────────────────────────────────────────────┐
│           Go API Gateway & Orchestration            │
├─────────────────────────────────────────────────────┤
│         Rust Core Engine (Safety & Concurrency)     │
├─────────────────────────────────────────────────────┤
│     C/C++ Compute Kernels (SIMD, CUDA, Metal)      │
├─────────────────────────────────────────────────────┤
│         Hardware Abstraction Layer (HAL)            │
└─────────────────────────────────────────────────────┘

🚀 Quick Start

Installation

# Build from source
./build.sh --release

# Or use package manager
cargo install insystem-compute
go get github.com/insystem-compute/sdk

Basic Usage

from insystem_compute import Engine, ModelConfig

# Initialize engine
engine = Engine(device="auto")

# Load model
config = ModelConfig(
    model_path="models/llama-3b.gguf",
    quantization="int4",
    batch_size=1
)
model = engine.load_model(config)

# Inference
response = model.generate("Hello, world!", max_tokens=100)
print(response)

Components

Core Engine (/core) - Rust-based inference runtime
Compute Kernels (/kernels) - C/C++ optimized operations
API Gateway (/gateway) - Go-based REST/gRPC APIs
Model Hub (MVP) (/hub + gateway static UI) - Local HuggingFace-style catalog and downloads under /api/v1/hub/* and web UI at /hub/
Compression (/compression) - Model optimization tools
HAL (/hal) - Hardware abstraction layer
SDKs (/sdks) - Client libraries for all major languages

🔧 Configuration

# config.yaml
engine:
  device: "auto"  # auto, cpu, cuda, metal, vulkan
  threads: 8
  memory_limit: "4GB"

model:
  format: "gguf"  # gguf, onnx, safetensors
  quantization: "int4"
  cache_size: 2048

api:
  port: 8080
  auth: "bearer"
  rate_limit: 1000

hub:
  registry: "../hub/registry.json"  # path used by gateway (HUB_REGISTRY env)

Commercial Licensing

Open Source (Apache 2.0)

Free for personal and research use
Community support

Commercial License

Enterprise SLA support
White-label options
Custom model optimization
Dedicated support team
Contact: sales@insystem-compute.com

Documentation

Security

Memory-safe Rust core
Sandboxed execution
Encrypted model storage
Audit logging
GDPR/CCPA compliant

Compatibility

Model Hub (MVP)

Run the gateway and open the Hub UI:

Build and start gateway

cd gateway
go build -o bin/gateway cmd/main.go
PORT=8080 HUB_REGISTRY=../hub/registry.json ./bin/gateway

Visit http://localhost:8080/hub/ (API at /api/v1/hub/*).
Register a model programmatically (optional):

python3 examples/python/06_hub_client.py

Notes:

Registry persists to hub/registry.json (simple JSON). Files entries may point to ../models/*.gguf locally.
Endpoints: GET /api/v1/hub/models, GET /api/v1/hub/models/{id}, POST /api/v1/hub/models, GET /api/v1/hub/models/{id}/download?file=....
Languages: Python, JavaScript, Java, C#, Go, Rust, C/C++
OS: Linux, Windows, macOS, Android, iOS
Hardware: x86, ARM, RISC-V, custom ASICs
Future: 2000+ year backward compatibility via semantic versioning

📈 Roadmap

Contributing

See CONTRIBUTING.md

Citation

@software{insystem_compute_2025,
  title={InSystem Compute: Universal On-Device LLM Framework},
  author={InSystem Compute Team},
  year={2025},
  url={https://github.com/AgentQ1/insystem-compute}
}

Support

GitHub Issues: Report bugs or request features
Documentation: Complete API docs
Enterprise: Contact for commercial licensing and support
Contributing: See CONTRIBUTING.md
Security: See SECURITY.md

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.firebase		.firebase
.venv/bin		.venv/bin
desktop		desktop
examples		examples
gateway		gateway
hub		hub
scripts		scripts
sdks/python/insystem_compute		sdks/python/insystem_compute
.firebaserc		.firebaserc
.gitignore		.gitignore
CAMERA_UI_CLEAN.md		CAMERA_UI_CLEAN.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
DEPLOY_NOW.md		DEPLOY_NOW.md
DESKTOP_APP.md		DESKTOP_APP.md
DOCS_UPDATED.md		DOCS_UPDATED.md
FIREBASE_DEPLOY.md		FIREBASE_DEPLOY.md
GATEWAY_RUNNING.md		GATEWAY_RUNNING.md
GITHUB_SUCCESS.md		GITHUB_SUCCESS.md
HOW_TO_RUN_MODELS.md		HOW_TO_RUN_MODELS.md
HUB_SETUP.md		HUB_SETUP.md
PIPELINE_READY.md		PIPELINE_READY.md
PLAYGROUND_STATUS.md		PLAYGROUND_STATUS.md
PLAYGROUND_STATUS.old.md		PLAYGROUND_STATUS.old.md
QUICK_RUN.txt		QUICK_RUN.txt
QUICK_START.txt		QUICK_START.txt
README.md		README.md
REALTIME_VISION.md		REALTIME_VISION.md
RUN_MODELS.md		RUN_MODELS.md
STATUS_COMPLETE.md		STATUS_COMPLETE.md
TEST_CAMERA.html		TEST_CAMERA.html
UI_2050_DESIGN.md		UI_2050_DESIGN.md
VERIFY_PIPELINE.sh		VERIFY_PIPELINE.sh
VISION_INTEGRATION_COMPLETE.md		VISION_INTEGRATION_COMPLETE.md
VISION_MODEL_COMPLETE.md		VISION_MODEL_COMPLETE.md
VISION_MODEL_GUIDE.md		VISION_MODEL_GUIDE.md
VISION_READY.md		VISION_READY.md
YOLO_LLAVA_PIPELINE.md		YOLO_LLAVA_PIPELINE.md
YOLO_PERFORMANCE.md		YOLO_PERFORMANCE.md
check_downloads.sh		check_downloads.sh
check_status.sh		check_status.sh
check_ui.sh		check_ui.sh
deploy.sh		deploy.sh
firebase.json		firebase.json
refresh_webapp.sh		refresh_webapp.sh
run_downloaded_model.py		run_downloaded_model.py
start_backend.sh		start_backend.sh
test_pipeline.sh		test_pipeline.sh
test_quick.py		test_quick.py
test_quick_vision.py		test_quick_vision.py
test_realtime.html		test_realtime.html
test_vision_complete.py		test_vision_complete.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InSystem Compute - Universal On-Device LLM Framework

Quick Start - Run Models NOW

Overview

Key Features

Performance Benchmarks

Architecture

🚀 Quick Start

Installation

Basic Usage

Components

🔧 Configuration

Commercial Licensing

Open Source (Apache 2.0)

Commercial License

Documentation

Security

Compatibility

Model Hub (MVP)

📈 Roadmap

Contributing

Citation

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InSystem Compute - Universal On-Device LLM Framework

Quick Start - Run Models NOW

Overview

Key Features

Performance Benchmarks

Architecture

🚀 Quick Start

Installation

Basic Usage

Components

🔧 Configuration

Commercial Licensing

Open Source (Apache 2.0)

Commercial License

Documentation

Security

Compatibility

Model Hub (MVP)

📈 Roadmap

Contributing

Citation

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages