An MCP-first launcher and management tool for llama.cpp llama-server instances. The MCP contract is the product; the HTTP Agent, llauncher CLI, and Streamlit UI are co-equal consumers of the same llauncher/operations/ service layer — three surfaces over one core, designed for both programmatic control (LLM agents, multi-node automation) and human operators.
The stateless service layer that every surface delegates to (ADR-008). Adding a verb here surfaces it across all four boundaries automatically.
- Verbs:
start,stop,swap,cancel,delete_model,list_orphans - Pre-flight seams: model-health probe and VRAM estimation, attachable as optional callables on
swap() - ADR-010 port discipline: every verb takes
portas a required argument — no auto-allocation, no env-var fallback
Canonical surface for LLM agents and automation. Stdio transport; full read + mutate coverage of the core verbs.
- Discovery:
list_models,get_model_config - Lifecycle:
start_server,stop_server,swap_server,cancel_server,server_status,get_server_logs,list_orphans - Configuration CRUD:
add_model,update_model_config,delete_model,validate_config
Same verbs over REST for multi-node setups (ADR-009 hub-spoke). Port-keyed routes (/start/{port}, /swap/{port}, /stop/{port}, /cancel/{port}, /footer-context/{port}) plus /status, /models, /models/health. Token-protected when bound off-loopback (ADR-003).
Web dashboard for human operators. Four tabs: Dashboard (read-only running view), Models (config CRUD + per-model start/stop/swap with explicit port picker), Nodes (peer registry), Audit (local audit-log tail).
Typer command-line surface, co-equal with MCP and UI. Subcommand groups: model (list, info), server (start, stop, cancel, status), orphan (list), node (add, list, remove, status), config (path, validate). Rich tables for human output and --json on every group for scripting.
- Config Persistence: Store configurations in
~/.llauncher/config.json(single source of truth) - Validation: Model paths verified, port conflicts detected, blacklists enforced
# Clone the repository
git clone https://github.com/shanevcantwell/llauncher
cd llauncher
# Install in development mode (with UI)
pip install -e ".[ui]"
# Optional: Install test dependencies
pip install -e ".[test]"If you see warnings like WARNING: Ignoring invalid distribution ~ during install:
# Clean up corrupted site-packages and reinstall
cd github\llauncher
rmdir /s /q .venv
python -m venv .venv
\.venv\Scripts\activate
pip install -e ".[ui]"Use the runner scripts for easiest setup:
The dashboard requires the local agent to be running. Start the agent first (in its own terminal), then the dashboard in a second terminal. The UI deliberately does not auto-spawn the agent — see ADR-009 and the "Why doesn't the UI start the agent for me?" expander rendered on the dashboard when the agent is down.
Linux/macOS:
./run.sh install # Set up virtual environment and install
./run.sh agent # Terminal 1: start agent in foreground
./run.sh ui # Terminal 2: start dashboard (requires agent)
./run.sh stop # Stop running agent
# Optional:
./run.sh agent-bg # Start agent detached (logs to agent.log)
./run.sh discover # List discovered launch scriptsWindows:
run.bat install :: Set up virtual environment and install
run.bat agent :: Terminal 1: start agent in foreground
run.bat ui :: Terminal 2: start dashboard (requires agent)
run.bat stop :: Stop running agent
:: Optional:
run.bat agent-bg :: Start agent detached (logs to agent.log)
run.bat discover :: List discovered launch scriptsFor a persistent install that survives reboots and restarts on crash,
the agent ships with installers for systemd (Linux, user-mode) and NSSM
(Windows). See docs/operations/run-as-a-service.md.
The UI is not service-managed by design — it's interactive and you
launch it on demand.
Start the MCP server:
llauncher-mcpOr configure in your MCP client (e.g., Claude Code):
{
"mcpServers": {
"llauncher": {
"command": "llauncher-mcp",
"args": []
}
}
}Trust boundary (stdio only). The MCP server speaks the MCP stdio transport and has no authentication of its own — it implicitly trusts whatever process spawned it over the stdio pipe (typically your MCP client, e.g. Claude Desktop / Claude Code). There is no network listener for MCP. Vetting the MCP client you hand these tools to is the operator's responsibility; llauncher cannot distinguish a benign caller from a malicious one once the stdio pipe is open. See
docs/plans/security-hardening-plan.md§2.2 (control C5) for the threat-model rationale.
| Tool | Description |
|---|---|
list_models |
List all configured models with current status (running/stopped) |
get_model_config |
Get full configuration details for a specific model |
start_server |
Start a llama-server instance on a given port (model_name + port required; ADR-010) |
stop_server |
Stop a running server by port number |
swap_server |
Atomically swap models on a port with rollback guarantee (ADR-011) |
cancel_server |
Cancel an in-flight start/swap on a port (ADR-014) |
server_status |
Get status summary of all running servers |
get_server_logs |
Fetch recent log lines from a running server |
list_orphans |
List unmanaged llama-server processes on the local node (ADR-015) |
update_model_config |
Update an existing model's configuration |
validate_config |
Validate a configuration without applying it |
add_model |
Add a new model configuration to the store |
delete_model |
Delete a model configuration (refuses if running; ADR-008 §4.1) |
Start the UI using the runner script (recommended):
Linux/macOS:
./run.sh uiWindows:
run.bat uiBind to loopback (no built-in auth). Streamlit binds wherever the operator launches it; the default is loopback. The runner scripts launch with
--server.address 127.0.0.1, and that is the recommended invocation for typical single-operator use. The dashboard itself has no built-in authentication — anything that can reach the port can drive every mutate path (start/stop servers, edit configs, manage nodes). Do not expose it beyond loopback without an operator-supplied gateway in front: Tailscale, an SSH tunnel, or a reverse proxy that enforces auth. Passing--server.address 0.0.0.0(or a LAN IP) without one of those is equivalent to publishing an unauthenticated admin console on your network. Seedocs/plans/security-hardening-plan.md§2.8 (control C12) for the threat-model rationale.
Read-only running view (no mutate verbs live here per M4 Slice 13 / #50). Status indicators (🟢 Running / ⚫ Stopped), uptime, and live log tail for each active server. Use the Models tab to start/stop/swap.
Config CRUD plus the per-model verb buttons. Add / edit / delete configurations and drive Start, Stop, Swap against the selected target node. Includes the explicit port picker (ui/components/port_picker.py) — ADR-010 requires the operator to choose the port at every call site; there is no auto-allocation or remembered default.
Peer registry for multi-node setups. Add / list / remove remote agent nodes, test connectivity, and observe status. The sidebar node_selector (ui/components/node_selector.py) chooses which node the Models tab acts against.
Tails the local audit log at LAUNCHER_AUDIT_PATH (~/.llauncher/audit.jsonl by default). Read-only view of commanded vs. observed events. Remote-node audit access is deferred per #64.
The llauncher Typer CLI is a co-equal consumer of llauncher/operations/ alongside the MCP server, HTTP Agent, and Streamlit UI. Every group supports a --json / -j flag for machine-readable output; the default is a Rich-rendered color table for human use.
Subcommand groups:
# Model configurations (read-only)
llauncher model list
llauncher model info mistral-7b
# Server lifecycle — port is required on start (ADR-010)
llauncher server start mistral-7b --port 8081
llauncher server stop 8081
llauncher server cancel 8081 # ADR-014: signals an in-flight start/swap
llauncher server status --json
# Orphans — unmanaged llama-server processes (ADR-015, read-only)
llauncher orphan list
# Remote nodes (ADR-009)
llauncher node add my-server --host 192.168.1.100 --port 8765
llauncher node list
llauncher node status --all
llauncher node remove my-server
# Configuration store
llauncher config path # print path to config.json
llauncher config validate mistral-7bEach group also accepts --help. The runner scripts (./run.sh agent, ./run.sh ui) remain the easiest way to launch the agent and dashboard; the CLI subcommands above act against an already-running stack.
Create model configurations directly in ~/.llauncher/config.json. Configs can be managed via the UI or MCP tools.
Example config entry:
{
"mistral": {
"name": "mistral",
"model_path": "/path/to/model.gguf",
"mmproj_path": null,
"n_gpu_layers": 255,
"ctx_size": 131072,
"threads": 8,
"threads_batch": 8,
"ubatch_size": 512,
"batch_size": null,
"flash_attn": "on",
"no_mmap": false,
"cache_type_k": "f32",
"cache_type_v": "f32",
"n_cpu_moe": null,
"parallel": 1,
"temperature": null,
"top_k": null,
"top_p": null,
"min_p": null,
"repeat_penalty": null,
"reverse_prompt": null,
"mlock": false,
"extra_args": ""
}
}Per ADR-010, port is supplied at every call site (UI port picker, CLI --port, MCP port arg, HTTP /start/{port} route) and is not persisted in the config. Legacy default_port entries in config.json are silently dropped on load.
llauncher includes validation rules to prevent problematic actions:
- Port conflicts: Prevents starting models on ports already in use
- Blacklisted ports: Default blacklist includes port 8080 (commonly used by other services)
- Model whitelists: Optionally restrict which models can be started
- Caller blacklists: Restrict which callers (UI, MCP, etc.) can perform actions
llauncher/
├── pyproject.toml
├── llauncher/
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py # Typer CLI (model/server/orphan/node/config groups)
│ ├── state.py # Legacy LauncherState — eviction-compat hook (ADR-008)
│ ├── operations/ # Stateless service layer; MCP/HTTP/CLI/UI all delegate here (ADR-008)
│ │ ├── start.py
│ │ ├── stop.py
│ │ ├── swap.py # ADR-011 five-phase swap with rollback
│ │ ├── delete.py
│ │ ├── orphan.py # ADR-015 read-only orphan listing
│ │ └── preflight.py # Model-health + VRAM seams
│ ├── agent/ # HTTP agent (FastAPI, port-keyed routes per ADR-010)
│ │ ├── auth.py
│ │ ├── config.py
│ │ ├── footer_cache.py # /footer-context/{port} TTL cache (ADR-012)
│ │ ├── middleware.py
│ │ ├── routing.py
│ │ └── server.py # Lifespan handler reaps managed children on SIGTERM/SIGINT
│ ├── mcp_server/ # MCP server (stdio transport)
│ │ ├── server.py
│ │ └── tools/ # servers / models / config tool groups
│ ├── core/ # Primitive substrate (no LauncherState)
│ │ ├── audit_log.py # JSON Lines audit (ADR-008)
│ │ ├── config.py # ConfigStore — single source of truth
│ │ ├── gpu.py # GPU collector (ADR-006)
│ │ ├── lockfile.py # Atomic O_EXCL per-port lockfiles
│ │ ├── log_rotation.py # ADR-013 append + rotate
│ │ ├── marker.py # In-flight swap/start marker (ADR-011/014)
│ │ ├── model_health.py # Cache probe (ADR-005)
│ │ ├── process.py # Subprocess management
│ │ └── settings.py # LAUNCHER_* env-var family
│ ├── models/
│ │ └── config.py # Pydantic ModelConfig (no default_port; ADR-010)
│ ├── remote/ # Multi-node hub-spoke (ADR-009)
│ │ ├── node.py # RemoteNode (port-keyed ops)
│ │ ├── registry.py # NodeRegistry
│ │ └── state.py # RemoteAggregator (swap_on_node parity)
│ └── ui/ # Streamlit dashboard
│ ├── app.py
│ ├── utils.py # render_op_result, OpResultSeverity ladder
│ ├── components/
│ │ ├── node_selector.py
│ │ └── port_picker.py # Explicit port input — no auto-allocation
│ └── tabs/
│ ├── audit.py
│ ├── dashboard.py # Read-only running view
│ ├── models.py # Config CRUD + start/stop/swap verbs
│ └── nodes.py
Run the test suite:
pytest
# or with coverage
pytest --cov=llauncher --cov-report=term-missingTest files are in tests/:
tests/unit/: Unit tests for models, config, and processtests/integration/: Integration tests for state management
For an inventory of which tests exist (file-by-file, with markers and
docstring first lines), see docs/generated/TEST_SUITE_SUMMARY.md.
Regenerate after adding or renaming tests:
python scripts/summarize_tests.pyThe coverage floor is pinned at --cov-fail-under=93 against non-UI
scope in pytest.ini; UI coverage is deferred to the AppTest harness
in #69 (v3-alpha).
llauncher supports managing llama-server instances across multiple machines (Windows and Linux) on a local network from a single dashboard.
Each managed node runs a lightweight agent that exposes an HTTP API. The "head" dashboard connects to these agents over the LAN:
┌─────────────────────────────────────┐
│ HEAD DASHBOARD │
│ - Streamlit UI with node selector │
│ - Connects to all agents via HTTP │
└─────────────┬───────────────────────┘
│ LAN (port 8765)
┌─────────┼─────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Agent │ │ Agent │ │ Agent │
│ Linux │ │Windows │ │ Linux │
│ :8765 │ │ :8765 │ │ :8765 │
└────────┘ └────────┘ └────────┘
On every machine you want to manage (including the head):
Linux/macOS:
git clone https://github.com/shanevcantwell/llauncher
cd llauncher
./run.sh installWindows:
git clone https://github.com/shanevcantwell/llauncher
cd llauncher
run.bat installUsing runner scripts (recommended):
Linux/macOS:
./run.sh agent # Foreground
./run.sh agent-bg # Background
./run.sh stop # Stop agentWindows:
run.bat agent # Foreground
run.bat agent-bg # Background
run.bat stop # Stop agentWith custom configuration:
# Linux/macOS
LLAUNCHER_AGENT_PORT=9000 LLAUNCHER_AGENT_NODE_NAME="my-server" ./run.sh agent
# Windows (PowerShell)
$env:LLAUNCHER_AGENT_PORT="9000"
$env:LLAUNCHER_AGENT_NODE_NAME="my-server"
run.bat agentEnvironment Variables:
LLAUNCHER_AGENT_HOST: Host to bind to (default:127.0.0.1). Set to0.0.0.0or a specific LAN IP to expose the agent to other hosts — see "Security Notes" below.LLAUNCHER_AGENT_PORT: Port to listen on (default:8765)LLAUNCHER_AGENT_NODE_NAME: Friendly name for the nodeLLAUNCHER_AGENT_TOKEN: Required when binding to anything other than loopback. The agent refuses to start on a non-loopback host without it. Special value-reads the token from stdin (one line). On a loopback start with no value set, a fresh token is auto-generated and written to~/.llauncher/agent.token(mode 0600).
Linux/macOS:
./run.sh uiWindows:
run.bat uiThe dashboard will automatically:
- Show a loading screen while initializing
- Register itself as the "local" node
In the dashboard:
- Go to the Nodes tab
- Click ➕ Add New Node
- Enter:
- Node Name: Friendly name (e.g.,
linux-box,windows-server) - Host: IP address or hostname (e.g.,
192.168.1.100) - Port: Agent port (default:
8765)
- Node Name: Friendly name (e.g.,
- Click 🔍 Test Connection to verify
- Click ➕ Add Node to register
Ensure port 8765 is open on managed nodes:
Linux (ufw):
sudo ufw allow 8765/tcpLinux (firewalld):
sudo firewall-cmd --permanent --add-port=8765/tcp
sudo firewall-cmd --reloadWindows (PowerShell):
New-NetFirewallRule -DisplayName "llauncher Agent" -Direction Inbound -LocalPort 8765 -Protocol TCP -Action Allow- Loopback by default: The agent binds to
127.0.0.1unlessLLAUNCHER_AGENT_HOSTis set explicitly. Set it to a LAN IP (or0.0.0.0) to expose the agent to other hosts on the network. - Token required for non-loopback binds: Binding to anything other than
127.0.0.1/::1/localhostrequiresLLAUNCHER_AGENT_TOKENto be set. The agent refuses to start otherwise. On loopback first-run with no token configured, a fresh token is generated at~/.llauncher/agent.token(mode 0600) and printed once to stderr. - Trusted LAN Only: Even with a token, only expose the agent on networks you trust — the transport is plain HTTP (no TLS). Tailscale is the recommended option for cross-host trust.
- Firewall: Restrict port 8765 to your LAN subnet.
The sidebar Node Selector (ui/components/node_selector.py) picks the target node — local plus any registered remotes. A single target is always selected; the "All Nodes" cross-node aggregate view was dropped in M4 Slice 13 (#50).
- Dashboard Tab: read-only running view across the selected node.
- Models Tab: config CRUD + per-model Start / Stop / Swap, acting on the selected node.
- Nodes Tab: registered-nodes list with Test Connection and Remove controls.
- Audit Tab: tails the local
LAUNCHER_AUDIT_PATH. Remote-node audit access is deferred per #64.
-
Verify agent is running on the remote node:
curl http://<node-ip>:8765/health
-
Check firewall rules on the remote node
-
Verify the agent is binding to the correct interface:
# Default is 127.0.0.1:8765 (loopback). For LAN access you must # have set LLAUNCHER_AGENT_HOST and LLAUNCHER_AGENT_TOKEN. netstat -tlnp | grep 8765
-
Check if port 8765 is already in use:
lsof -i :8765 # or netstat -tlnp | grep 8765
-
Use a different port:
LLAUNCHER_AGENT_PORT=9000 llauncher-agent
-
Verify network connectivity:
ping <remote-node-ip>
-
Check that the agent is not binding to loopback only:
- The default is
127.0.0.1:8765. For cross-host access setLLAUNCHER_AGENT_HOST=0.0.0.0(or a specific LAN IP) andLLAUNCHER_AGENT_TOKEN— the agent refuses to start on a non-loopback host without a token.
- The default is
When an agent is running, visit http://<node-ip>:8765/docs for interactive API documentation.
MIT