llauncher

An MCP-first launcher and management tool for llama.cpp llama-server instances. The MCP contract is the product; the HTTP Agent, llauncher CLI, and Streamlit UI are co-equal consumers of the same llauncher/operations/ service layer — three surfaces over one core, designed for both programmatic control (LLM agents, multi-node automation) and human operators.

Features

Core (`llauncher/operations/`)

The stateless service layer that every surface delegates to (ADR-008). Adding a verb here surfaces it across all four boundaries automatically.

Verbs: start, stop, swap, cancel, delete_model, list_orphans
Pre-flight seams: model-health probe and VRAM estimation, attachable as optional callables on swap()
ADR-010 port discipline: every verb takes port as a required argument — no auto-allocation, no env-var fallback

MCP Server

Canonical surface for LLM agents and automation. Stdio transport; full read + mutate coverage of the core verbs.

Discovery: list_models, get_model_config
Lifecycle: start_server, stop_server, swap_server, cancel_server, server_status, get_server_logs, list_orphans
Configuration CRUD: add_model, update_model_config, delete_model, validate_config

HTTP Agent

Same verbs over REST for multi-node setups (ADR-009 hub-spoke). Port-keyed routes (/start/{port}, /swap/{port}, /stop/{port}, /cancel/{port}, /footer-context/{port}) plus /status, /models, /models/health. Token-protected when bound off-loopback (ADR-003).

Streamlit UI

Web dashboard for human operators. Four tabs: Dashboard (read-only running view), Models (config CRUD + per-model start/stop/swap with explicit port picker), Nodes (peer registry), Audit (local audit-log tail).

CLI (`llauncher`)

Typer command-line surface, co-equal with MCP and UI. Subcommand groups: model (list, info), server (start, stop, cancel, status), orphan (list), node (add, list, remove, status), config (path, validate). Rich tables for human output and --json on every group for scripting.

Configuration

Config Persistence: Store configurations in ~/.llauncher/config.json (single source of truth)
Validation: Model paths verified, port conflicts detected, blacklists enforced

Installation

# Clone the repository
git clone https://github.com/shanevcantwell/llauncher
cd llauncher

# Install in development mode (with UI)
pip install -e ".[ui]"

# Optional: Install test dependencies
pip install -e ".[test]"

Windows Notes

If you see warnings like WARNING: Ignoring invalid distribution ~ during install:

# Clean up corrupted site-packages and reinstall
cd github\llauncher
rmdir /s /q .venv
python -m venv .venv
\.venv\Scripts\activate
pip install -e ".[ui]"

Quick Start

Use the runner scripts for easiest setup:

The dashboard requires the local agent to be running. Start the agent first (in its own terminal), then the dashboard in a second terminal. The UI deliberately does not auto-spawn the agent — see ADR-009 and the "Why doesn't the UI start the agent for me?" expander rendered on the dashboard when the agent is down.

Linux/macOS:

./run.sh install     # Set up virtual environment and install
./run.sh agent       # Terminal 1: start agent in foreground
./run.sh ui          # Terminal 2: start dashboard (requires agent)
./run.sh stop        # Stop running agent
# Optional:
./run.sh agent-bg    # Start agent detached (logs to agent.log)
./run.sh discover    # List discovered launch scripts

Windows:

run.bat install      :: Set up virtual environment and install
run.bat agent        :: Terminal 1: start agent in foreground
run.bat ui           :: Terminal 2: start dashboard (requires agent)
run.bat stop         :: Stop running agent
:: Optional:
run.bat agent-bg     :: Start agent detached (logs to agent.log)
run.bat discover     :: List discovered launch scripts

Running the agent as a service

For a persistent install that survives reboots and restarts on crash, the agent ships with installers for systemd (Linux, user-mode) and NSSM (Windows). See docs/operations/run-as-a-service.md. The UI is not service-managed by design — it's interactive and you launch it on demand.

Usage

MCP Server

Start the MCP server:

llauncher-mcp

Or configure in your MCP client (e.g., Claude Code):

{
  "mcpServers": {
    "llauncher": {
      "command": "llauncher-mcp",
      "args": []
    }
  }
}

Trust boundary (stdio only). The MCP server speaks the MCP stdio transport and has no authentication of its own — it implicitly trusts whatever process spawned it over the stdio pipe (typically your MCP client, e.g. Claude Desktop / Claude Code). There is no network listener for MCP. Vetting the MCP client you hand these tools to is the operator's responsibility; llauncher cannot distinguish a benign caller from a malicious one once the stdio pipe is open. See docs/plans/security-hardening-plan.md §2.2 (control C5) for the threat-model rationale.

Available MCP Tools

Tool	Description
`list_models`	List all configured models with current status (running/stopped)
`get_model_config`	Get full configuration details for a specific model
`start_server`	Start a llama-server instance on a given port (`model_name` + `port` required; ADR-010)
`stop_server`	Stop a running server by port number
`swap_server`	Atomically swap models on a port with rollback guarantee (ADR-011)
`cancel_server`	Cancel an in-flight start/swap on a port (ADR-014)
`server_status`	Get status summary of all running servers
`get_server_logs`	Fetch recent log lines from a running server
`list_orphans`	List unmanaged `llama-server` processes on the local node (ADR-015)
`update_model_config`	Update an existing model's configuration
`validate_config`	Validate a configuration without applying it
`add_model`	Add a new model configuration to the store
`delete_model`	Delete a model configuration (refuses if running; ADR-008 §4.1)

Streamlit UI

Start the UI using the runner script (recommended):

Linux/macOS:

./run.sh ui

Windows:

run.bat ui

Bind to loopback (no built-in auth). Streamlit binds wherever the operator launches it; the default is loopback. The runner scripts launch with --server.address 127.0.0.1, and that is the recommended invocation for typical single-operator use. The dashboard itself has no built-in authentication — anything that can reach the port can drive every mutate path (start/stop servers, edit configs, manage nodes). Do not expose it beyond loopback without an operator-supplied gateway in front: Tailscale, an SSH tunnel, or a reverse proxy that enforces auth. Passing --server.address 0.0.0.0 (or a LAN IP) without one of those is equivalent to publishing an unauthenticated admin console on your network. See docs/plans/security-hardening-plan.md §2.8 (control C12) for the threat-model rationale.

Dashboard Tab

Read-only running view (no mutate verbs live here per M4 Slice 13 / #50). Status indicators (🟢 Running / ⚫ Stopped), uptime, and live log tail for each active server. Use the Models tab to start/stop/swap.

Models Tab

Config CRUD plus the per-model verb buttons. Add / edit / delete configurations and drive Start, Stop, Swap against the selected target node. Includes the explicit port picker (ui/components/port_picker.py) — ADR-010 requires the operator to choose the port at every call site; there is no auto-allocation or remembered default.

Nodes Tab

Peer registry for multi-node setups. Add / list / remove remote agent nodes, test connectivity, and observe status. The sidebar node_selector (ui/components/node_selector.py) chooses which node the Models tab acts against.

Audit Tab

Tails the local audit log at LAUNCHER_AUDIT_PATH (~/.llauncher/audit.jsonl by default). Read-only view of commanded vs. observed events. Remote-node audit access is deferred per #64.

CLI

The llauncher Typer CLI is a co-equal consumer of llauncher/operations/ alongside the MCP server, HTTP Agent, and Streamlit UI. Every group supports a --json / -j flag for machine-readable output; the default is a Rich-rendered color table for human use.

Subcommand groups:

# Model configurations (read-only)
llauncher model list
llauncher model info mistral-7b

# Server lifecycle — port is required on start (ADR-010)
llauncher server start mistral-7b --port 8081
llauncher server stop 8081
llauncher server cancel 8081         # ADR-014: signals an in-flight start/swap
llauncher server status --json

# Orphans — unmanaged llama-server processes (ADR-015, read-only)
llauncher orphan list

# Remote nodes (ADR-009)
llauncher node add my-server --host 192.168.1.100 --port 8765
llauncher node list
llauncher node status --all
llauncher node remove my-server

# Configuration store
llauncher config path                # print path to config.json
llauncher config validate mistral-7b

Each group also accepts --help. The runner scripts (./run.sh agent, ./run.sh ui) remain the easiest way to launch the agent and dashboard; the CLI subcommands above act against an already-running stack.

Configuration

Create model configurations directly in ~/.llauncher/config.json. Configs can be managed via the UI or MCP tools.

Example config entry:

{
  "mistral": {
    "name": "mistral",
    "model_path": "/path/to/model.gguf",
    "mmproj_path": null,
    "n_gpu_layers": 255,
    "ctx_size": 131072,
    "threads": 8,
    "threads_batch": 8,
    "ubatch_size": 512,
    "batch_size": null,
    "flash_attn": "on",
    "no_mmap": false,
    "cache_type_k": "f32",
    "cache_type_v": "f32",
    "n_cpu_moe": null,
    "parallel": 1,
    "temperature": null,
    "top_k": null,
    "top_p": null,
    "min_p": null,
    "repeat_penalty": null,
    "reverse_prompt": null,
    "mlock": false,
    "extra_args": ""
  }
}

Per ADR-010, port is supplied at every call site (UI port picker, CLI --port, MCP port arg, HTTP /start/{port} route) and is not persisted in the config. Legacy default_port entries in config.json are silently dropped on load.

Change Management

llauncher includes validation rules to prevent problematic actions:

Port conflicts: Prevents starting models on ports already in use
Blacklisted ports: Default blacklist includes port 8080 (commonly used by other services)
Model whitelists: Optionally restrict which models can be started
Caller blacklists: Restrict which callers (UI, MCP, etc.) can perform actions

Project Structure

llauncher/
├── pyproject.toml
├── llauncher/
│   ├── __init__.py
│   ├── __main__.py
│   ├── cli.py                  # Typer CLI (model/server/orphan/node/config groups)
│   ├── state.py                # Legacy LauncherState — eviction-compat hook (ADR-008)
│   ├── operations/             # Stateless service layer; MCP/HTTP/CLI/UI all delegate here (ADR-008)
│   │   ├── start.py
│   │   ├── stop.py
│   │   ├── swap.py             # ADR-011 five-phase swap with rollback
│   │   ├── delete.py
│   │   ├── orphan.py           # ADR-015 read-only orphan listing
│   │   └── preflight.py        # Model-health + VRAM seams
│   ├── agent/                  # HTTP agent (FastAPI, port-keyed routes per ADR-010)
│   │   ├── auth.py
│   │   ├── config.py
│   │   ├── footer_cache.py     # /footer-context/{port} TTL cache (ADR-012)
│   │   ├── middleware.py
│   │   ├── routing.py
│   │   └── server.py           # Lifespan handler reaps managed children on SIGTERM/SIGINT
│   ├── mcp_server/             # MCP server (stdio transport)
│   │   ├── server.py
│   │   └── tools/              # servers / models / config tool groups
│   ├── core/                   # Primitive substrate (no LauncherState)
│   │   ├── audit_log.py        # JSON Lines audit (ADR-008)
│   │   ├── config.py           # ConfigStore — single source of truth
│   │   ├── gpu.py              # GPU collector (ADR-006)
│   │   ├── lockfile.py         # Atomic O_EXCL per-port lockfiles
│   │   ├── log_rotation.py     # ADR-013 append + rotate
│   │   ├── marker.py           # In-flight swap/start marker (ADR-011/014)
│   │   ├── model_health.py     # Cache probe (ADR-005)
│   │   ├── process.py          # Subprocess management
│   │   └── settings.py         # LAUNCHER_* env-var family
│   ├── models/
│   │   └── config.py           # Pydantic ModelConfig (no default_port; ADR-010)
│   ├── remote/                 # Multi-node hub-spoke (ADR-009)
│   │   ├── node.py             # RemoteNode (port-keyed ops)
│   │   ├── registry.py         # NodeRegistry
│   │   └── state.py            # RemoteAggregator (swap_on_node parity)
│   └── ui/                     # Streamlit dashboard
│       ├── app.py
│       ├── utils.py            # render_op_result, OpResultSeverity ladder
│       ├── components/
│       │   ├── node_selector.py
│       │   └── port_picker.py  # Explicit port input — no auto-allocation
│       └── tabs/
│           ├── audit.py
│           ├── dashboard.py    # Read-only running view
│           ├── models.py       # Config CRUD + start/stop/swap verbs
│           └── nodes.py

Testing

Run the test suite:

pytest
# or with coverage
pytest --cov=llauncher --cov-report=term-missing

Test files are in tests/:

tests/unit/: Unit tests for models, config, and process
tests/integration/: Integration tests for state management

For an inventory of which tests exist (file-by-file, with markers and docstring first lines), see docs/generated/TEST_SUITE_SUMMARY.md. Regenerate after adding or renaming tests:

python scripts/summarize_tests.py

The coverage floor is pinned at --cov-fail-under=93 against non-UI scope in pytest.ini; UI coverage is deferred to the AppTest harness in #69 (v3-alpha).

Multi-Node Management (Remote)

llauncher supports managing llama-server instances across multiple machines (Windows and Linux) on a local network from a single dashboard.

Architecture

Each managed node runs a lightweight agent that exposes an HTTP API. The "head" dashboard connects to these agents over the LAN:

┌─────────────────────────────────────┐
│         HEAD DASHBOARD              │
│  - Streamlit UI with node selector  │
│  - Connects to all agents via HTTP  │
└─────────────┬───────────────────────┘
              │ LAN (port 8765)
    ┌─────────┼─────────┐
    ▼         ▼         ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Agent  │ │ Agent  │ │ Agent  │
│ Linux  │ │Windows │ │ Linux  │
│ :8765  │ │ :8765  │ │ :8765  │
└────────┘ └────────┘ └────────┘

Deployment

1. Install on Each Node

On every machine you want to manage (including the head):

Linux/macOS:

git clone https://github.com/shanevcantwell/llauncher
cd llauncher
./run.sh install

Windows:

git clone https://github.com/shanevcantwell/llauncher
cd llauncher
run.bat install

2. Start the Agent on Each Node

Using runner scripts (recommended):

Linux/macOS:

./run.sh agent     # Foreground
./run.sh agent-bg  # Background
./run.sh stop      # Stop agent

Windows:

run.bat agent      # Foreground
run.bat agent-bg   # Background
run.bat stop       # Stop agent

With custom configuration:

# Linux/macOS
LLAUNCHER_AGENT_PORT=9000 LLAUNCHER_AGENT_NODE_NAME="my-server" ./run.sh agent

# Windows (PowerShell)
$env:LLAUNCHER_AGENT_PORT="9000"
$env:LLAUNCHER_AGENT_NODE_NAME="my-server"
run.bat agent

Environment Variables:

LLAUNCHER_AGENT_HOST: Host to bind to (default: 127.0.0.1). Set to 0.0.0.0 or a specific LAN IP to expose the agent to other hosts — see "Security Notes" below.
LLAUNCHER_AGENT_PORT: Port to listen on (default: 8765)
LLAUNCHER_AGENT_NODE_NAME: Friendly name for the node
LLAUNCHER_AGENT_TOKEN: Required when binding to anything other than loopback. The agent refuses to start on a non-loopback host without it. Special value - reads the token from stdin (one line). On a loopback start with no value set, a fresh token is auto-generated and written to ~/.llauncher/agent.token (mode 0600).

3. Start the Dashboard on the Head Machine

Linux/macOS:

./run.sh ui

Windows:

run.bat ui

The dashboard will automatically:

Show a loading screen while initializing
Register itself as the "local" node

4. Add Remote Nodes

In the dashboard:

Go to the Nodes tab
Click ➕ Add New Node
Enter:
- Node Name: Friendly name (e.g., linux-box, windows-server)
- Host: IP address or hostname (e.g., 192.168.1.100)
- Port: Agent port (default: 8765)
Click 🔍 Test Connection to verify
Click ➕ Add Node to register

Network Configuration

Firewall Rules

Ensure port 8765 is open on managed nodes:

Linux (ufw):

sudo ufw allow 8765/tcp

Linux (firewalld):

sudo firewall-cmd --permanent --add-port=8765/tcp
sudo firewall-cmd --reload

Windows (PowerShell):

New-NetFirewallRule -DisplayName "llauncher Agent" -Direction Inbound -LocalPort 8765 -Protocol TCP -Action Allow

Security Notes

Loopback by default: The agent binds to 127.0.0.1 unless LLAUNCHER_AGENT_HOST is set explicitly. Set it to a LAN IP (or 0.0.0.0) to expose the agent to other hosts on the network.
Token required for non-loopback binds: Binding to anything other than 127.0.0.1 / ::1 / localhost requires LLAUNCHER_AGENT_TOKEN to be set. The agent refuses to start otherwise. On loopback first-run with no token configured, a fresh token is generated at ~/.llauncher/agent.token (mode 0600) and printed once to stderr.
Trusted LAN Only: Even with a token, only expose the agent on networks you trust — the transport is plain HTTP (no TLS). Tailscale is the recommended option for cross-host trust.
Firewall: Restrict port 8765 to your LAN subnet.

Usage

The sidebar Node Selector (ui/components/node_selector.py) picks the target node — local plus any registered remotes. A single target is always selected; the "All Nodes" cross-node aggregate view was dropped in M4 Slice 13 (#50).

Dashboard Tab: read-only running view across the selected node.
Models Tab: config CRUD + per-model Start / Stop / Swap, acting on the selected node.
Nodes Tab: registered-nodes list with Test Connection and Remove controls.
Audit Tab: tails the local LAUNCHER_AUDIT_PATH. Remote-node audit access is deferred per #64.

Troubleshooting

"Connection Failed" when adding node

Verify agent is running on the remote node:
```
curl http://<node-ip>:8765/health
```
Check firewall rules on the remote node

Verify the agent is binding to the correct interface:

# Default is 127.0.0.1:8765 (loopback). For LAN access you must
# have set LLAUNCHER_AGENT_HOST and LLAUNCHER_AGENT_TOKEN.
netstat -tlnp | grep 8765

Agent won't start

Check if port 8765 is already in use:

lsof -i :8765
# or
netstat -tlnp | grep 8765

Use a different port:

LLAUNCHER_AGENT_PORT=9000 llauncher-agent

Can't connect from Windows to Linux (or vice versa)

Verify network connectivity:
```
ping <remote-node-ip>
```
Check that the agent is not binding to loopback only:
- The default is 127.0.0.1:8765. For cross-host access set LLAUNCHER_AGENT_HOST=0.0.0.0 (or a specific LAN IP) and LLAUNCHER_AGENT_TOKEN — the agent refuses to start on a non-loopback host without a token.

API Documentation

When an agent is running, visit http://<node-ip>:8765/docs for interactive API documentation.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
docs		docs
llauncher		llauncher
pi-footer-extension		pi-footer-extension
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

llauncher

Features

Core (llauncher/operations/)

MCP Server

HTTP Agent

Streamlit UI

CLI (llauncher)

Configuration

Installation

Windows Notes

Quick Start

Running the agent as a service

Usage

MCP Server

Available MCP Tools

Streamlit UI

Dashboard Tab

Models Tab

Nodes Tab

Audit Tab

CLI

Configuration

Change Management

Project Structure

Testing

Multi-Node Management (Remote)

Architecture

Deployment

1. Install on Each Node

2. Start the Agent on Each Node

3. Start the Dashboard on the Head Machine

4. Add Remote Nodes

Network Configuration

Firewall Rules

Security Notes

Usage

Troubleshooting

"Connection Failed" when adding node

Agent won't start

Can't connect from Windows to Linux (or vice versa)

API Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

Core (`llauncher/operations/`)

CLI (`llauncher`)