Skip to content

ladzaretti/ragx-cli

Repository files navigation

ragx — a terminal first local RAG assistant

status: beta Go Report Card license

ragx is a minimal and hackable Retrieval-Augmented Generation (RAG) CLI tool designed for the terminal. It embeds your local files (or stdin), retrieves relevant chunks with KNN search, and queries OpenAI-compatible LLMs (local or remote) via a CLI/TUI workflow.

Table of Contents (click to expand)

Why ragx-cli?

  • Local first or API backed – run fully offline with Ollama, or connect to OpenAI/ChatGPT APIs.
  • Minimal stack – Go + SQLite-vec + Ollama/OpenAI.
  • Terminal native – query via CLI or lightweight TUI.
  • Configurable – tweak system/user prompts and RAG parameters (chunk size, overlap, model, etc.).

Supported platforms

Binaries are built for:

OS Architectures Tested on
Linux amd64, arm64 ✅ Fedora 43, Debian 13
macOS amd64, arm64 ❌ not tested
Windows amd64, arm64 ❌ not tested

Important

Only Linux has been tested so far. Other platforms are built but unverified, feedback is welcome.

Installation

Option 1: Install via Go

 go install github.com/ladzaretti/ragx-cli/cmd/ragx@latest

Option 2: Install via curl

curl -sSL https://raw.githubusercontent.com/ladzaretti/ragx-cli/main/install.sh | bash

This will auto detect your OS/arch, downloads the latest release, and installs ragx to /usr/local/bin.

Option 3: Download a release

Visit the Releases page for a list of available downloads.

Overview

Key features

  • OpenAI API v1 compatible: point ragx at any compatible base URL (local Ollama or remote).
  • Per-provider/per-model overrides: control temperature and context length.
  • TUI chat: a lightweight Bubble Tea interface for iterative querying.
  • Terminal first: pipe text in, embed directories/files, and print results.

Use cases

  • Local knowledge bases: notes, READMEs, docs.
  • Quick “ask my files” workflows.

Pipeline Overview

flowchart TD
  subgraph Ingest
    A["Files / stdin"] --> B["Chunker"]
    B --> C["Embedder"]
    C --> D["Vector Database"]
  end

  subgraph Query
    Q["User Query"] --> QE["Embed Query"]
    QE --> D
    D --> K["Top-K Chunks"]
    K --> P["Prompt Builder (system + template + context)"]
    P --> M["LLM"]
    M --> R["Answer"]
  end
Loading

Usage

$ ragx --help
ragx is a terminal-first RAG assistant. 
Embed data, run retrieval, and query local or remote OpenAI API-compatible LLMs.

Usage:
  ragx [command]

Available Commands:
  chat        Start the interactive terminal chat UI
  config      Show and inspect configuration
  help        Help about any command
  list        List available models
  query       Embed data from paths or stdin and query the LLM
  version     Show version

Flags:
  -h, --help   help for ragx

Use "ragx [command] --help" for more information about a command.

Configuration file

The optional configuration file can be generated using ragx config generate command:

[llm]
# Default model to use
default_model = ''
# LLM providers (uncomment and duplicate as needed)
# [[llm.providers]]
# base_url = 'http://localhost:11434'
# api_key = '<KEY>'		# optional
# temperature = 0.7		# optional (provider default)
# Optional model definitions for context length control (uncomment and duplicate as needed)
# [[llm.models]]
# id = 'qwen:8b'		# Model identifier
# context = 4096		# Maximum context length in tokens
# temperature = 0.7		# optional (model override)

[prompt]
# System prompt to override the default assistant behavior
# system_prompt = ''
# Go text/template for building the USER QUERY + CONTEXT block.
# Supported template vars:
#   .Query   — the user's raw query string
#   .Chunks  — slice of retrieved chunks (may be empty). Each chunk has:
#       .ID       — numeric identifier of the chunk
#       .Source   — source file/path of the chunk
#       .Content  — text content of the chunk
# user_prompt_tmpl = ''

[embedding]
# Model used for embeddings
embedding_model = ''
# Number of characters per chunk
# chunk_size = 2000
# Number of characters overlapped between chunks (must be less than chunk_size)
# overlap = 200
# Number of chunks to retrieve during RAG
# top_k = 20

# [logging]
# Directory where log file will be stored (default: XDG_STATE_HOME or ~/.local/state/ragx)
# log_dir = '/home/gbi/.local/state/ragx'
# Filename for the log file
# log_filename = '.log'
# log_level = 'info'

Default prompts

System Prompt

User Query Template

Config precedence (highest -> lowest)

  • CLI flags
  • Environment variables (if supported)
    • OpenAI environment variables are auto-detected: OPENAI_API_BASE, OPENAI_API_KEY
  • Config file
  • Defaults

Examples

Listing available models

$ ragx list
http://localhost:11434/v1
      jina/jina-embeddings-v2-base-en:latest
      gpt-oss:20b
      qwen3:8b-fast
      nomic-embed-text:latest
      mxbai-embed-large:latest
      llama3.1:8b
      qwen2.5-coder:14b
      deepseek-r1:8b
      qwen3:8b
      nomic-embed-text:v1.5
      hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL

TUI session

ragx tui screenshot

CLI one-shot query

$ ragx query readme.md \
            --model qwen3:8b \
            --embedding-model jina/jina-embeddings-v2-base-en:latest \
            "how do i tune chunk_size and overlap for large docs?"
- Tune `chunk_size` (chars per chunk) and `overlap` (chars overlapped between chunks) via config or CLI flags. For large documents, increase `chunk_size` (e.g., 2000+ chars) but keep `overlap` < `chunk_size` (e.g., 200). Adjust based on your content type and retrieval needs. [1]

Sources:
[1] (chunk 2) /home/gbi/GitHub/Gabriel-Ladzaretti/ragx-cli/readme.md

These are minimal examples to get you started.
For detailed usage and more examples, run each subcommand with --help.

Common command patterns

Note

These examples assume you already have a valid config file with at least one provider, a default chat model, and an embedding model set.

Tip

Generate a starter config with: ragx config generate > ~/.ragx.toml.

  # embed all .go files in current dir and query via --query/-q
  ragx query . -M '\.go$' -q "<query>"

  # embed a single file and provide query after flag terminator --
  ragx query readme.md -- "<query>"

  # embed stdin and provide query as the last positional argument
  cat readme.md | ragx query "<query>"

  # embed multiple paths with filter
  ragx query docs src -M '(?i)\.(md|txt)$' -q "<query>"

  # embed all .go files in current dir and start the TUI
  ragx chat . -M '\.go$'

  # embed multiple paths (markdown and txt) and start the TUI
  ragx chat ./docs ./src -M '(?i)\.(md|txt)$'

  # embed stdin and start the TUI
  cat readme.md | ragx chat

Notes & Limitation

  • Chunking is currently character based
    • adjust chunk_size/overlap for your content and use case.
  • The vector database is ephemeral: created fresh per session and not saved to disk.

About

A terminal-first local RAG assistant

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •