A secure runtime for AI agents. ποΈ
Your agent runs on the host β the code it executes runs in a gVisor box.
cd ~/code/my-repo
temenos claude # Claude runs on the host; everything it *executes* runs in a boxThat one command keeps Claude Code where it works best β on the host, with its auth,
updates, and model API intact β while banning every native host-touching tool (Bash,
Read, Write, Edit, WebFetch, β¦) and routing its only execution path through a
box: a rootless gVisor sandbox with a small, Python-native policy.
A shell that tries to rm -rf ~, read ~/.ssh/id_rsa, or overwrite /usr/bin is contained β
not because the model promised to behave, but because the sandbox boundary won't let it (and
--no-net cuts egress too). The agent is trusted; the code it runs is not. π‘οΈ
And because that boundary is structural β a banned tool, not a model on its best behavior β it holds the same whether you supervise one agent by hand or run a thousand in allow-all mode. Same box, any scale. Scale it up when you need to.
temenos (ΟΞμΡνοΟ): a bounded precinct β a space set apart with a clear edge.
- ποΈ Agent on the host, execution in a box. No broken updates, no API keys plumbed into a container, no re-auth. Only the code the agent runs is sandboxed.
- π One box or a thousand. The multi-box
BoxManageris the same code path whether it's one repo, your overnight swarm, or a multi-tenant platform. Allow-all stays safe because the dangerous capability is removed, not merely discouraged. - π Real isolation, not a syscall allowlist. gVisor is a userspace kernel β the host
filesystem is invisible beyond what policy mounts, and most kernel-CVE surface is
intercepted before it reaches the host. Network is one flag (
--no-netto fully isolate). - π« Sole-execution-path, enforced.
temenos claudedenies native tools and exposes onlymcp__temenos__exec/read/write/listover MCP, with--strict-mcp-configso a stray.mcp.jsoncan't re-open a host-capable server. - π¦ Boxes are first-class. Named, persistent, checkpointed, inspectable β
temenos exec,temenos shell,temenos diff,temenos audit. Everything lives in a.temenos/<box>/you canrm -rf. - πΎ Durable by default. Background checkpoint (gVisor
fscheckpoint, ~30 ms) + restore on next use β re-runtemenos claudein a repo and you resume where you left off. - π A clean core API.
Policy β Box β ExecResult. The CLI and MCP server are thin layers over the sameBoxyou can use directly from Python. Core has zero runtime deps. - π§ͺ Leak-tested. A containment battery (
tests/leak/) is the acceptance gate: no host write, host secrets invisible, egress blocked when isolated,/procescape blocked, memory cap OOM-kills.
A runtime that gives trusted agents an untrusted-code execution surface β whether
that's one agent or a swarm of them. You point a harness (Claude Code today; any MCP-capable
agent in principle) at a box and remove its host-touching tools. The agent keeps editing
your real files and calling its model β but every bash/python/file/network action it takes
happens inside gVisor, under a policy you set, observable and reversible. Run that for one
repo, or run it fifty times in parallel under one daemon β same boundary either way.
| Not⦠| Because |
|---|---|
| A Docker / container runtime | It doesn't package or ship services. It wraps gVisor to confine an agent's execution and mounts your real repo live β the unit is a task, not an image. |
| A VM-per-task sandbox | The agent stays on the host (auth, updates, model API intact). Spinning a VM per task throws all that away; temenos boxes only what runs. |
| A seccomp / AppArmor filter | gVisor is a full userspace kernel, not a syscall allowlist bolted onto the host kernel β a categorically larger isolation boundary. |
| A defense against a malicious agent | The threat model trusts the agent binary. temenos contains the untrusted code the agent runs, not the agent itself. |
| A network firewall | v1 network is a toggle: full passthrough by default (no filtering) or off (--no-net, isolated). Filtered per-host egress is post-v1 β the load-bearing gap for adversarial fleets (see limits). |
| temenos | Docker container | VM per task | firejail / bubblewrap | prompt guardrails | |
|---|---|---|---|---|---|
| Isolation boundary | userspace kernel (gVisor) | shared host kernel + ns | hardware | shared kernel + seccomp/ns | none |
| Agent stays on host (auth/updates intact) | β | β | β | ||
| Sole-execution-path for an agent | β built-in (deny natives + MCP) | π§ DIY | π§ DIY | π§ DIY | β (trust the model) |
| Fleet control plane (N boxes, one daemon) | β
BoxManager |
π§ DIY (compose/k8s) | π§ DIY | β | β |
| Kernel-CVE surface | low | high | low | high | n/a |
| Per-task object (named, checkpointed, inspectable) | β | β (containers) | β | β | |
| Setup per task | low (rootless, a box dir) | medium | high | low | none |
In short: containers and VMs isolate whole programs you ship; firejail filters syscalls on the host kernel; prompt-level guardrails ask nicely. temenos isolates the code trusted agents run, keeps the agents on the host, and makes each box a first-class, inspectable object you can run one of β or a fleet of. It builds on gVisor and the Model Context Protocol. π
you βββΊ claude (host) (ΓN agents, in a swarm)
β native tools BANNED (--disallowedTools, --strict-mcp-config)
β only mcp__temenos__* ALLOWED
βΌ
temenos daemon ββHTTP /mcp/<box-id>βββΊ Box (gVisor / runsc)
(one per user, β’ host /usr,/etc bound read-only
supervises every box) β’ repo mounted (live-writable by default)
β’ network on by default (--no-net isolates)
β’ writes land in an overlay
A box = a Policy + a gVisor runtime + a data dir. One daemon per user auto-spawns on
first use and supervises every box, serving a REST control plane (the CLI) and a per-box MCP
data plane (the agents). Boxes are keyed by the hash of their data dir, so two repos' default
boxes β or fifty swarm agents β never collide. For the full design, decisions, and verification
log, see plan.md.
temenos is Linux + gVisor for v1; a macOS (Seatbelt) backend is designed β see
macos_plan.md.
1. gVisor (runsc) β the sandbox. (official guide)
ARCH=$(uname -m)
wget https://storage.googleapis.com/gvisor/releases/release/latest/${ARCH}/runsc
chmod +x runsc && sudo mv runsc /usr/local/bin/2. temenos
pip install "temenos[all]" # daemon + MCP + CLI
# or from a checkout:
git clone https://github.com/farizrahman4u/temenos && cd temenos
pip install -e ".[all,dev]"The core library has zero runtime deps; [all] pulls FastAPI/uvicorn/mcp/httpx for the
daemon and CLI. The bare temenos image β¦ commands work without extras.
3. (optional) mmdebstrap β to build clean box base images (so boxes can apt/pip/npm
install into a writable system). Without it you boot against the host's read-only /usr.
sudo apt-get install mmdebstrap4. Check your host:
$ temenos doctor
gVisor (runsc): yes
platform: ptrace # kvm on bare metal, systrap on most VMs, ptrace on WSL2
mmdebstrap: yes
systemd-run: yes # required to ENFORCE memory/cpu limits (see Limits)cd ~/code/my-repo
temenos create # makes .temenos/default in this repo (+ .gitignore)
temenos exec default -- python3 -c "print(6*7)"
temenos exec -it default -- python3 # interactive REPL (PTY); also vim, bash, etc.
temenos shell default # an interactive shell inside the box
temenos ls # boxes the daemon is running
temenos audit default # what ran in the box
temenos diff default # files under the box's write paths
temenos rm default # stop + delete the boxA bare box name resolves project-first (.temenos/<name>, walking up from CWD), then
global (~/.local/share/temenos/boxes/<name>); a project box shadows a global one of the
same name (with a warning).
Attach Claude Code:
temenos claude # box 'default' in this repo (network on by default)
temenos claude --box review --no-net # a separate box, fully network-isolated
temenos claude --dry-run # print the exact claude invocation, don't launch
temenos claude -- --model opus # args after `--` go to claudeThe repo mounts live-writable, so the agent's edits land in your real files β the sandbox
contains execution, not the trusted agent's edits. --ephemeral flips the repo to read-only.
Fan a task across dozens of agents and approving each tool call by hand is a non-starter, so
you run them allow-all. The structural boundary is what makes that safe: an agent can yolo
freely because there's nothing dangerous to allow β every action lands in a policy'd box. The
CLI and MCP server are thin layers over the same Box/BoxManager you can drive directly:
from temenos import Box, Policy
from temenos.manager import BoxManager
# one box, directly β filesystem locked by default (no host writes, tight limits);
# network is on by default, so pass network=False to isolate it
with Box("demo", Policy(write=["/home/me/out"], network=False)) as box:
box.write_file("/home/me/out/run.py", "print(6 * 7)\n")
print(box.exec(["python3", "/home/me/out/run.py"]).stdout) # "42\n"
box.exec(["cat", "/etc/shadow"]).ok # -> False (host invisible)
# a fleet β one contained box per agent, via the registry the daemon owns
mgr = BoxManager()
ids = [mgr.create(f"/srv/boxes/agent-{i}", Policy()) for i in range(50)]
for bid in ids:
print(mgr.get(bid).exec(["echo", "hi"]).stdout.strip())
mgr.shutdown() # checkpoints (where enabled) + tears down the whole fleetPolicy is frozen; restrict() derives child policies that can only narrow (widening raises
PolicyViolation). gVisor is the density that makes a per-agent box cheap β a VM each is too
heavy, a plain container a weaker boundary. (mgr.map(...) fan-out sugar is on the
roadmap; the loop above works today.) Runnable:
examples/python_api.py.
temenos serve --port 8839 # REST control + per-box MCP (/mcp/<box-id>), supervising every boxBoxManager is also the multi-tenant control plane β a "tenant" and an "agent" are the same
abstraction, so "run my swarm" and "run many customers' agents on untrusted code" are the same
code, not two products. The isolation invariant β no writable mount is ever shared across
boxes β holds today; tenant-scoped tokens and aggregate quotas are the platform-tier
roadmap.
Full docs live in docs/:
- Concepts β boxes, policies, the daemon, scope, checkpoints, images
- CLI reference Β· Python API
- Agents & MCP β
temenos claudeand wiring other harnesses - Box images Β· Security model Β· Configuration
The agent runs on the host; only what it executes runs in a box. That single split is what makes temenos both usable and safe: the agent keeps its identity, updates, and model access (so it actually works), while every command it issues crosses a hard sandbox edge (so it can't hurt you). Everything else β the MCP data plane, the banned-natives wiring, the checkpointing box, the multi-box registry β exists to make that split airtight and the "code" the agent runs the sole execution path. And because that boundary is structural, not a promise, the split holds identically whether you supervise one agent by hand or run a hundred in allow-all mode: the box is the enforcement, not the human.
| Command | What it does |
|---|---|
temenos doctor |
gVisor/platform/mmdebstrap/systemd capability check |
temenos image build NAME [--from mmdebstrap|minimal|host-copy|download] |
build a box base image |
temenos image ls Β· rm NAME |
list / remove images |
temenos serve [--port] |
run the per-user daemon (auto-spawned otherwise) |
temenos create [NAME] [flags] |
create/ensure a box in this project |
temenos ls |
list running boxes (project boxes marked) |
temenos exec [-it] NAME -- CMD⦠|
run a command in a box (-it = interactive PTY) |
temenos shell NAME |
interactive shell in a box (PTY) |
temenos rm NAME [--keep-data] |
stop + delete a box |
temenos audit NAME Β· diff NAME |
audit log / write-set manifest |
temenos claude [--box N] [flags] [-- claude-args] |
attach Claude with natives banned |
temenos version |
print version |
Box-creation flags (on create and claude): --image NAME, --net/--no-net,
--scratch disk\|memory, --force-memory, --ephemeral-fs (never checkpoint),
--no-autosave (checkpoint only on close), --ephemeral (repo read-only),
--volume HOST:TARGET[:ro\|rw], --memory MB, --cpu SECONDS, --global.
The agent is trusted (you installed it; it authenticates as you; it isn't trying to escape). The code it runs is untrusted β model-authored shell/python that may be buggy, prompt-injected, or hostile. temenos's job is the sole-execution-path guarantee: every bit of that code goes through a box, and a box can't touch the host beyond its policy. That guarantee is what lets you take humans out of the loop at fleet scale.
| Property | Status (v1, gVisor) |
|---|---|
| Filesystem escape | blocked β host invisible beyond policy mounts; /proc/1/root is the box |
| Host writes outside policy | blocked β /usr,/etc read-only; writes go to an overlay |
| Network exfiltration | blocked with --no-net (isolated netns) β but network is on by default (see limits) |
| Cross-box crosstalk | blocked β no writable mount is ever shared between boxes |
| Kernel-CVE surface | mostly blocked β gVisor intercepts syscalls in userspace |
| Memory/CPU/pid exhaustion | enforced via a per-box systemd scope (needs delegation β below) |
Limits you should know about:
- Network is on by default, and it's a toggle, not a firewall. The default is full host
passthrough β no filtering (localhost, LAN, cloud metadata, arbitrary egress);
--no-net(network=False) fully isolates a box. This is the load-bearing gap for adversarial fleets: a swarm of network-on boxes is an exfiltration surface multiplied by N. Run untrusted/multi-tenant boxes with--no-net; filtered per-host egress is post-v1. - Resource limits need systemd user-cgroup delegation. Without it, limits degrade to
unenforced with a warning (
temenos doctorshows the mode) β don't run adversarial work there. - Per-tenant authz/quotas are in progress. The box-per-owner isolation invariant holds today; tenant-scoped tokens and aggregate quotas are the platform-tier roadmap.
- WSL2 uses the
ptraceplatform (no/dev/kvm). Slower, but the security model β the gVisor sentry β is identical to kvm/systrap. - Side channels between co-resident boxes are out of scope for v1.
- Not a defense against a malicious agent binary β see the threat model.
Run tests/leak/ against your host and re-run it when your harness upgrades (new tools are
new holes). A config isn't "supported" until it's green.
Layer 3 surfaces server/ (FastAPI REST + per-box MCP) Β· cli.py
Layer 2Β½ registry manager.py (BoxManager: ids, fleet lifecycle, checkpoint loop)
Layer 2 box box.py (exec/read/write/list, audit, checkpoint)
Layer 1 backend backends/ (gVisor: OCI bundle, held-run+exec, overlay, systemd scope)
Layer 0 data policy.py Β· result.py Β· storage.py Β· exceptions.py (pure, no OS calls)
BoxManager (Layer 2Β½) is the hinge: it's the local-swarm registry and the multi-tenant
control plane β one piece of code, two reach. Lower layers never import higher ones; REST, MCP,
and the CLI are all the same Policy β Box β ExecResult path. Delete server/ and the core
still works.
pip install -e ".[all,dev]"
PYTHONPATH=. pytest # full suite
PYTHONPATH=. pytest tests/leak/ -v # the containment gate (needs gVisor)
TEMENOS_NET_TESTS=1 pytest tests/test_image_mmdebstrap.py # opt-in network e2eTests that need gVisor / mmdebstrap / network are gated and skip cleanly without them.
Pre-1.0 (0.2.0). v1 is feature-complete and leak-tested on Linux + gVisor; the API may
still shift before 1.0. Roadmap, ordered by where the value is:
- Fleet fan-out ergonomics β
mgr.map(...)over N boxes, batch lifecycle, aggregate audit. - Filtered network egress β per-host SNI/allowlist proxy, so swarm boxes get contained network instead of all-or-nothing (the biggest gap for adversarial fleets).
- Per-tenant authz & quotas β tenant-scoped tokens, aggregate caps + backpressure.
- macOS (Seatbelt) backend β see
macos_plan.md. - True diff-vs-original, a remote (over-the-daemon) attach, persisted audit logs.
(Local interactive PTY shells already work β
temenos shell/temenos exec -it.)
temenos stands on:
- gVisor β the userspace kernel that is the actual sandbox.
- Model Context Protocol β the agent-facing tool plane.
temenos's contribution is the composition: trusted agents on the host, untrusted-code boxes underneath, one daemon that scales it from a single repo to a fleet, and the wiring that makes each box the sole execution path.
Apache-2.0 Β© temenos contributors