ragx is a minimal and hackable Retrieval-Augmented Generation (RAG) CLI tool designed for the terminal. It embeds your local files (or stdin), retrieves relevant chunks with KNN search, and queries OpenAI-compatible LLMs (local or remote) via a CLI/TUI workflow.
Table of Contents (click to expand)
- Local first or API backed – run fully offline with Ollama, or connect to OpenAI/ChatGPT APIs.
- Minimal stack – Go + SQLite-vec + Ollama/OpenAI.
- Terminal native – query via CLI or lightweight TUI.
- Configurable – tweak system/user prompts and RAG parameters (chunk size, overlap, model, etc.).
Binaries are built for:
| OS | Architectures | Tested on |
|---|---|---|
| Linux | amd64, arm64 | ✅ Fedora 43, Debian 13 |
| macOS | amd64, arm64 | ❌ not tested |
| Windows | amd64, arm64 | ❌ not tested |
Important
Only Linux has been tested so far. Other platforms are built but unverified, feedback is welcome.
go install github.com/ladzaretti/ragx-cli/cmd/ragx@latestcurl -sSL https://raw.githubusercontent.com/ladzaretti/ragx-cli/main/install.sh | bashThis will auto detect your OS/arch, downloads the latest release, and installs ragx to /usr/local/bin.
Visit the Releases page for a list of available downloads.
- OpenAI API
v1compatible: pointragxat any compatible base URL (local Ollama or remote). - Per-provider/per-model overrides: control temperature and context length.
- TUI chat: a lightweight Bubble Tea interface for iterative querying.
- Terminal first: pipe text in, embed directories/files, and print results.
- Local knowledge bases: notes, READMEs, docs.
- Quick “ask my files” workflows.
flowchart TD
subgraph Ingest
A["Files / stdin"] --> B["Chunker"]
B --> C["Embedder"]
C --> D["Vector Database"]
end
subgraph Query
Q["User Query"] --> QE["Embed Query"]
QE --> D
D --> K["Top-K Chunks"]
K --> P["Prompt Builder (system + template + context)"]
P --> M["LLM"]
M --> R["Answer"]
end
$ ragx --help
ragx is a terminal-first RAG assistant.
Embed data, run retrieval, and query local or remote OpenAI API-compatible LLMs.
Usage:
ragx [command]
Available Commands:
chat Start the interactive terminal chat UI
config Show and inspect configuration
help Help about any command
list List available models
query Embed data from paths or stdin and query the LLM
version Show version
Flags:
-h, --help help for ragx
Use "ragx [command] --help" for more information about a command.The optional configuration file can be generated using ragx config generate command:
[llm]
# Default model to use
default_model = ''
# LLM providers (uncomment and duplicate as needed)
# [[llm.providers]]
# base_url = 'http://localhost:11434'
# api_key = '<KEY>' # optional
# temperature = 0.7 # optional (provider default)
# Optional model definitions for context length control (uncomment and duplicate as needed)
# [[llm.models]]
# id = 'qwen:8b' # Model identifier
# context = 4096 # Maximum context length in tokens
# temperature = 0.7 # optional (model override)
[prompt]
# System prompt to override the default assistant behavior
# system_prompt = ''
# Go text/template for building the USER QUERY + CONTEXT block.
# Supported template vars:
# .Query — the user's raw query string
# .Chunks — slice of retrieved chunks (may be empty). Each chunk has:
# .ID — numeric identifier of the chunk
# .Source — source file/path of the chunk
# .Content — text content of the chunk
# user_prompt_tmpl = ''
[embedding]
# Model used for embeddings
embedding_model = ''
# Number of characters per chunk
# chunk_size = 2000
# Number of characters overlapped between chunks (must be less than chunk_size)
# overlap = 200
# Number of chunks to retrieve during RAG
# top_k = 20
# [logging]
# Directory where log file will be stored (default: XDG_STATE_HOME or ~/.local/state/ragx)
# log_dir = '/home/gbi/.local/state/ragx'
# Filename for the log file
# log_filename = '.log'
# log_level = 'info'- CLI flags
- Environment variables (if supported)
- OpenAI environment variables are auto-detected:
OPENAI_API_BASE,OPENAI_API_KEY
- OpenAI environment variables are auto-detected:
- Config file
- Defaults
$ ragx list
http://localhost:11434/v1
jina/jina-embeddings-v2-base-en:latest
gpt-oss:20b
qwen3:8b-fast
nomic-embed-text:latest
mxbai-embed-large:latest
llama3.1:8b
qwen2.5-coder:14b
deepseek-r1:8b
qwen3:8b
nomic-embed-text:v1.5
hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL$ ragx query readme.md \
--model qwen3:8b \
--embedding-model jina/jina-embeddings-v2-base-en:latest \
"how do i tune chunk_size and overlap for large docs?"
- Tune `chunk_size` (chars per chunk) and `overlap` (chars overlapped between chunks) via config or CLI flags. For large documents, increase `chunk_size` (e.g., 2000+ chars) but keep `overlap` < `chunk_size` (e.g., 200). Adjust based on your content type and retrieval needs. [1]
Sources:
[1] (chunk 2) /home/gbi/GitHub/Gabriel-Ladzaretti/ragx-cli/readme.mdThese are minimal examples to get you started.
For detailed usage and more examples, run each subcommand with --help.
Note
These examples assume you already have a valid config file with at least one provider, a default chat model, and an embedding model set.
Tip
Generate a starter config with: ragx config generate > ~/.ragx.toml.
# embed all .go files in current dir and query via --query/-q
ragx query . -M '\.go$' -q "<query>"
# embed a single file and provide query after flag terminator --
ragx query readme.md -- "<query>"
# embed stdin and provide query as the last positional argument
cat readme.md | ragx query "<query>"
# embed multiple paths with filter
ragx query docs src -M '(?i)\.(md|txt)$' -q "<query>"
# embed all .go files in current dir and start the TUI
ragx chat . -M '\.go$'
# embed multiple paths (markdown and txt) and start the TUI
ragx chat ./docs ./src -M '(?i)\.(md|txt)$'
# embed stdin and start the TUI
cat readme.md | ragx chat- Chunking is currently character based
- adjust
chunk_size/overlapfor your content and use case.
- adjust
- The vector database is ephemeral: created fresh per session and not saved to disk.
