Self-Hosting Guide

This comprehensive guide walks you through deploying Simple Agent Manager (SAM) to your own infrastructure. Follow each section carefully—skipping steps is the most common cause of deployment issues.

Quick Start (Automated Deployment)

For the fastest deployment experience, use the automated GitHub Actions workflow with Pulumi infrastructure management. Deployment is automatic on every push to main.

For detailed step-by-step instructions, see the Quickstart Guide.

Prerequisites (One-Time Setup)

Fork this repository
Have a domain on Cloudflare (nameservers already pointed to Cloudflare — see Cloudflare Setup if not yet done)
Create a Cloudflare API Token — see the detailed permissions table below
Note your Account ID and Zone ID from the Cloudflare dashboard (domain overview, right sidebar)
Create an R2 API Token (separate from above - for Pulumi state storage):
- Go to Cloudflare Dashboard → R2 → Manage R2 API Tokens
- Create token with Object Read & Write permissions
- Note: The state bucket is created automatically by the workflow
Create GitHub App (see GitHub Setup below)
Generate a Pulumi passphrase for encrypting state:
```
openssl rand -base64 32
```

GitHub Environment Configuration

Automated deployment configuration lives in a GitHub Environment named production. This makes deployment inputs visible and editable in the GitHub UI. Runtime Worker vars that are not explicitly passed by the workflow still come from the checked-in top-level [vars] in apps/api/wrangler.toml.

Create the environment:

Go to your fork's Settings → Environments
Click New environment
Name it production and click Configure environment

Add environment variables (visible in UI):

Variable	Description	Example
`BASE_DOMAIN`	Your domain for the deployment	`example.com`
`RESOURCE_PREFIX`	Prefix for Cloudflare resources (optional)	`sam`
`PULUMI_STATE_BUCKET`	R2 bucket for Pulumi state (optional)	`sam-pulumi-state`

Optional feature flags (GitHub Environment variables):

Variable	Description	Default
`REQUIRE_APPROVAL`	Require admin approval for new users. First user becomes superadmin.	(unset — all users active)
`HETZNER_BASE_IMAGE`	Hetzner VM base image. Set to `ubuntu-24.04` for emergency rollback from the faster `docker-ce` marketplace default.	`docker-ce`

Optional runtime-config limit variables (Worker vars):

These are runtime Worker variables, not GitHub Environment variables in the current workflow. To change them for automated deployments, edit the top-level [vars] in apps/api/wrangler.toml before deploying, or extend .github/workflows/deploy-reusable.yml and scripts/deploy/sync-wrangler-config.ts to pass them through. Cloudflare Wrangler environment vars are non-inheritable, so the sync script copies top-level [vars] into the generated [env.production.vars] / [env.staging.vars] sections.

Variable	Description	Default
`MAX_PROJECT_RUNTIME_ENV_VARS_PER_PROJECT`	Max runtime env vars saved per project	`150`
`MAX_PROJECT_RUNTIME_FILES_PER_PROJECT`	Max runtime files saved per project	`50`
`MAX_PROJECT_RUNTIME_ENV_VALUE_BYTES`	Max bytes per runtime env var value	`8192`
`MAX_PROJECT_RUNTIME_FILE_CONTENT_BYTES`	Max bytes per runtime file content	`131072`
`MAX_PROJECT_RUNTIME_FILE_PATH_LENGTH`	Max runtime file path length (chars)	`256`

Optional AI task title generation variables (Worker vars):

Variable	Description	Default
`TASK_TITLE_MODEL`	Workers AI model for task title generation	`@cf/google/gemma-3-12b-it`
`TASK_TITLE_MAX_LENGTH`	Max characters in a generated title	`100`
`TASK_TITLE_TIMEOUT_MS`	Timeout (ms) for AI title generation before falling back to truncation	`5000`
`TASK_TITLE_GENERATION_ENABLED`	Set to `false` to disable AI generation entirely	`true`
`TASK_TITLE_SHORT_MESSAGE_THRESHOLD`	Messages at or below this length bypass AI	`100`
`TASK_TITLE_MAX_RETRIES`	Max retry attempts on AI generation failure (rate limit, transient errors)	`2`
`TASK_TITLE_RETRY_DELAY_MS`	Base delay (ms) between retries (exponential backoff: delay × 2^attempt)	`1000`
`TASK_TITLE_RETRY_MAX_DELAY_MS`	Max delay (ms) cap for retry backoff	`4000`

Add environment secrets (hidden):

Secret	Description
`CF_API_TOKEN`	Cloudflare API token with D1, KV, R2, DNS, Workers Scripts, Workers Observability, AI Gateway, Workers Routes, Pages, and SSL/Certificates permissions
`CF_ACCOUNT_ID`	Your Cloudflare account ID (32-char hex). Also used as a Worker secret for the admin observability log viewer.
`CF_ZONE_ID`	Your domain's zone ID (32-char hex)
`R2_ACCESS_KEY_ID`	R2 API token access key
`R2_SECRET_ACCESS_KEY`	R2 API token secret key
`PULUMI_CONFIG_PASSPHRASE`	Your generated passphrase
`GH_CLIENT_ID`	GitHub App client ID
`GH_CLIENT_SECRET`	GitHub App client secret
`GH_APP_ID`	GitHub App ID
`GH_APP_PRIVATE_KEY`	GitHub App private key (raw PEM or base64 encoded — both work)
`GH_APP_SLUG`	GitHub App slug (URL name)
`GH_WEBHOOK_SECRET`	GitHub webhook HMAC-SHA256 verification secret. Required when the GitHub App webhook is active; must match the GitHub App webhook secret exactly. The deploy workflow maps this to the Worker secret `GITHUB_WEBHOOK_SECRET`.

Optional secrets (TLS — usually not needed):

Secret	Description
`CF_ORIGIN_CA_KEY`	Deprecated fallback. Cloudflare Origin CA Key — only needed if your `CF_API_TOKEN` lacks the `Zone > SSL and Certificates > Edit` permission and you can't update it. The Origin CA Key is deprecated by Cloudflare (removal Sept 2026). Prefer adding the SSL permission to your API token instead.

Optional secrets (purpose-specific security overrides — recommended for production):

Secret	Description
`BETTER_AUTH_SECRET`	BetterAuth session signing/encryption (overrides `ENCRYPTION_KEY` for sessions)
`CREDENTIAL_ENCRYPTION_KEY`	AES-GCM encryption of user cloud credentials (overrides `ENCRYPTION_KEY` for credential storage)

Optional secrets (for GCP OIDC integration — see GCP Setup Guide for full instructions):

Secret	Description
`GOOGLE_CLIENT_ID`	Google Cloud Console OAuth 2.0 client ID (enables "Connect Google Cloud" in Settings)
`GOOGLE_CLIENT_SECRET`	Google Cloud Console OAuth 2.0 client secret

GCP OAuth Redirect URI: When creating a Google OAuth 2.0 client, add https://api.<YOUR_BASE_DOMAIN>/api/deployment/gcp/callback as an authorized redirect URI. This is a single static URI shared by all projects — no per-project URIs needed.

Optional GCP VM provisioning configuration (env vars, not secrets — sensible defaults provided):

Variable	Default	Description
`GCP_STS_SCOPE`	`https://www.googleapis.com/auth/cloud-platform`	OAuth scope for STS token exchange
`GCP_SA_IMPERSONATION_SCOPES`	`https://www.googleapis.com/auth/compute`	Comma-separated scopes for SA impersonation

For the full list of GCP configuration variables, see the GCP Setup Guide.

Optional GCP deployment configuration (for project-level Defang deployment — sensible defaults provided):

Variable	Default	Description
`GCP_DEPLOY_WIF_POOL_ID`	`sam-deploy-pool`	WIF pool ID for project-level deployment auth
`GCP_DEPLOY_WIF_PROVIDER_ID`	`sam-oidc`	OIDC provider within the deploy pool
`GCP_DEPLOY_SERVICE_ACCOUNT_ID`	`sam-deployer`	Service account for deployment operations
`GCP_DEPLOY_IDENTITY_TOKEN_EXPIRY_SECONDS`	`600`	Identity token lifetime in seconds

⚠️ Naming Convention — read this before troubleshooting "missing secret" errors: GitHub App secrets use GH_* prefix (not GITHUB_*) because GitHub Actions secret names cannot start with GITHUB_. The deployment workflow automatically maps GH_* → GITHUB_* when setting Cloudflare Worker secrets. If you see GITHUB_CLIENT_ID or GITHUB_WEBHOOK_SECRET in code or .env files, those are Worker-side names — use GH_CLIENT_ID and GH_WEBHOOK_SECRET in GitHub Environment secrets. Google OAuth secrets use GOOGLE_* directly.

Note: Security keys (ENCRYPTION_KEY, JWT_PRIVATE_KEY, JWT_PUBLIC_KEY) and TLS certificates (ORIGIN_CA_CERT, ORIGIN_CA_KEY) are automatically generated and persisted via Pulumi state in R2. No manual intervention required—keys are created on first deployment and reused automatically on subsequent deployments.

Deploy

Automatic deployment: Every push to main triggers a deployment automatically.

First deployment:

Configure the GitHub Environment (see above)
Push any commit to main, OR
Go to Actions → "Deploy" → "Run workflow" for manual trigger

Subsequent deployments: Just merge PRs to main. The workflow:

Validates all required configuration exists
Provisions infrastructure via Pulumi (idempotent)
Deploys API Worker and Web UI via Wrangler
Runs database migrations
Builds and uploads VM Agent binaries
Runs health check

Teardown

To remove all resources:

Go to Actions → "Teardown"
Click "Run workflow"
Type DELETE to confirm
Click "Run workflow"

For more control or troubleshooting, continue with the manual setup below.

Prerequisites & Preparation
Cloudflare Setup
GitHub Setup
Project Setup
Manual Building & Deployment (Optional)
DNS Configuration
Verification
Maintenance
Troubleshooting
Cost Estimation

Prerequisites & Preparation

Before starting, ensure you have the following ready.

Required Accounts

Account	Purpose	Tier Needed	Sign-up Link
Cloudflare	API hosting, DNS, storage	Free tier	cloudflare.com
GitHub	Authentication, repository access	Free tier	github.com
Domain Registrar	Your workspace domain	Any	(you likely already have one)

Note on cloud providers: SAM uses a Bring-Your-Own-Cloud (BYOC) model. Each user provides their own Hetzner (or other provider) API token through the Settings UI to create workspaces. You do not need a shared cloud provider account for the platform itself — Cloudflare is the only infrastructure the platform operator manages.

Required Tools

Install these on your development machine:

# Node.js 20+ (check version)
node --version  # Should be v20.x or higher

# pnpm 9+ (install if missing)
npm install -g pnpm
pnpm --version  # Should be 9.x or higher

# Go 1.22+ (needed to compile the VM Agent — the binary that runs on each workspace VM)
go version  # Should be go1.22.x or higher

# Git
git --version

Installing Go (if not installed):

macOS: brew install go
Ubuntu/Debian: sudo apt install golang-go (or use official installer)
Windows: Download from go.dev/dl

Preparation Checklist

All required accounts created
All tools installed and verified
A domain you control (e.g., example.com or workspaces.example.com)
30-60 minutes of uninterrupted time

Cloudflare Setup

This section covers setting up Cloudflare as your infrastructure provider.

Step 1: Add Your Domain to Cloudflare

If your domain is not already on Cloudflare:

Log in to Cloudflare Dashboard
Click "Add a Site" (or "Add site" button)
Enter your domain (e.g., example.com) and click Continue
Select the Free plan and click Continue
Cloudflare will scan your existing DNS records—review and click Continue
Important: Note the two nameservers Cloudflare assigns (e.g., ivy.ns.cloudflare.com, rudy.ns.cloudflare.com)

Step 2: Update Nameservers at Your Registrar

You must point your domain to Cloudflare's nameservers. This varies by registrar:

GoDaddy:

Go to my.godaddy.com → My Products → DNS
Click Nameservers → Change → Enter custom nameservers
Enter Cloudflare's nameservers, click Save

Namecheap:

Go to namecheap.com → Domain List → Manage
Under Nameservers, select Custom DNS
Enter Cloudflare's nameservers, click Save

Google Domains / Squarespace Domains:

Go to domains.squarespace.com
Select your domain → DNS → Nameservers → Use custom nameservers
Enter Cloudflare's nameservers

Other Registrars: Look for "Nameservers" or "DNS Settings" in your registrar's dashboard.

Important: Nameserver changes can take up to 24 hours to propagate. Cloudflare will email you when the domain is active.

Step 3: Find Your Account ID and Zone ID

You'll need these IDs for configuration:

In Cloudflare Dashboard, select your domain
Scroll down on the Overview page
In the right sidebar under API, you'll see:
- Zone ID: Copy this (32-character hex string)
- Account ID: Copy this (32-character hex string)

Save these values—you'll need them later.

Step 4: Create API Token with Required Permissions

SAM needs a Cloudflare API token with specific permissions:

Go to My Profile (top-right icon) → API Tokens
Click "Create Token"
Click "Create Custom Token" (not a template)
Configure the token:

Token name: simple-agent-manager

Permissions — add all of these. Each row maps to a single permission in the Cloudflare UI: select the Scope (Account or Zone), then the Category group, then the specific Permission and Access Level.

Scope	Category	Permission	Access Level
Account	Developer Platform	D1	Edit
Account	Developer Platform	Workers KV Storage	Edit
Account	Developer Platform	Workers R2 Storage	Edit
Account	Developer Platform	Workers Scripts	Edit
Account	Developer Platform	Workers Observability	Read
Account	Developer Platform	Pages	Edit
Account	AI	AI Gateway	Edit
Zone	Developer Platform	Workers Routes	Edit
Zone	SSL & Certificates	SSL and Certificates	Edit
Zone	DNS & Zone	DNS	Edit
Zone	DNS & Zone	Zone	Read

Zone Resources: Select Include → Specific zone → your domain

Account Resources: Select Include → Your account name

Click Continue to summary → Create Token
Copy the token immediately—it won't be shown again

Step 5: Create Cloudflare Resources

Note: If using the Quick Start (Automated Deployment), skip this step. Pulumi automatically creates D1, KV, and R2 resources when you push to main.

Manual resource creation (optional)

Open your terminal and run these commands:

# Login to Cloudflare via Wrangler
npx wrangler login

# Create D1 Database
npx wrangler d1 create workspaces
# Note the database_id from the output!

# Create KV Namespace for sessions
npx wrangler kv namespace create sessions
# Note the namespace id from the output!

# Create R2 Bucket for VM Agent binaries and task attachments
npx wrangler r2 bucket create workspaces-assets

R2 CORS Configuration (Required for Task Attachments)

Quick Start (Automated Deployment): R2 CORS is configured automatically on every deploy by scripts/deploy/configure-r2-cors.sh. Skip this section if you are using the automated deployment pipeline.

If you are deploying manually and want to enable file attachments on task submissions, configure CORS on the R2 bucket to allow direct browser uploads via presigned PUT URLs:

# Create a cors-rules.json file:
cat > cors-rules.json << 'CORS'
[
  {
    "AllowedOrigins": ["https://app.YOUR_DOMAIN"],
    "AllowedMethods": ["PUT"],
    "AllowedHeaders": ["*"],
    "ExposeHeaders": ["ETag"],
    "MaxAgeSeconds": 3600
  }
]
CORS

# Apply CORS rules to the bucket (via S3-compatible API or Cloudflare Dashboard)
# Dashboard: R2 → workspaces-assets → Settings → CORS Policy

Replace YOUR_DOMAIN with your BASE_DOMAIN value (e.g., https://app.simple-agent-manager.org).

You also need R2 S3-compatible API credentials for presigned URL generation. Create these in the Cloudflare Dashboard under R2 → Manage R2 API Tokens, with Object Read & Write permissions scoped to the workspaces-assets bucket. Set R2_ACCESS_KEY_ID and R2_SECRET_ACCESS_KEY as Worker secrets.

Save these IDs from the command outputs:

D1 Database ID (e.g., abc123...)
KV Namespace ID (e.g., def456...)

GitHub Setup

SAM uses a single GitHub App for both user login (OAuth) and repository access.

Step 1: Create GitHub App

Go to GitHub App Settings
Click "New GitHub App"
Fill in the form:

Basic Information:

Field	Value
GitHub App name	Simple Agent Manager
Homepage URL	`https://app.example.com`

Identifying and authorizing users:

Field	Value
Callback URL	`https://api.example.com/api/auth/callback/github`
Expire user authorization tokens	✓ Checked
Request user authorization (OAuth) during installation	☐ Unchecked
Enable Device Flow	☐ Unchecked

Important: "Request user authorization (OAuth) during installation" MUST be unchecked. When checked, it disables the Setup URL and causes the post-installation redirect to hit the OAuth callback, which fails because BetterAuth didn't initiate the flow. Users log in separately via the app's login button.

Important: GitHub App user access tokens use app/user permissions (not OAuth scopes). SAM reads the user's primary email from GET /user/emails, so the Email addresses user permission must be granted.

Post installation:

Field	Value
Setup URL (optional)	`https://api.example.com/api/github/callback`
Redirect on update	✓ Checked

Note: The Setup URL points to the API, not the web UI. The API records the installation in the database and then redirects the user to https://app.example.com/settings.

Webhook:

Field	Value
Active	✓ Checked
Webhook URL	`https://api.example.com/api/github/webhook`
Webhook secret	Generate a random string and save the same value as the `GH_WEBHOOK_SECRET` GitHub Environment secret

Permissions:

Repository permissions:

Permission	Access
Contents	Read and write
Metadata	Read-only

Note: Contents requires Read and write access because workspaces need to commit and push code changes back to repositories.

Account permissions:

Permission	Access
Email addresses	Read-only

Note: SAM uses this permission to read the account's primary email from GET /user/emails. Without it, SAM falls back to the public profile email from GET /user, or a GitHub noreply fallback when no email is available.

Where can this GitHub App be installed?: Select based on your needs:

Only on this account: For personal use
Any account: For public/team use

Click "Create GitHub App"
Note the App ID (number shown at top)
Copy the Client ID and generate a Client Secret — you'll need both for OAuth login

Step 2: Generate GitHub App Private Key

On the GitHub App page, scroll to "Private keys"
Click "Generate a private key"
A .pem file will download automatically
Save this file securely—you'll need it for configuration

Project Setup

Step 1: Clone and Install

# Clone the repository
git clone https://github.com/your-org/simple-agent-manager.git
cd simple-agent-manager

# Install dependencies
pnpm install

Step 2: Generate Security Keys (Local Development Only)

Note: For production deployments, security keys are automatically managed by Pulumi and persist in R2. This step is only needed for local development.

# Generate JWT and encryption keys for local development
pnpm tsx scripts/deploy/generate-keys.ts

This generates:

ENCRYPTION_KEY: Shared fallback key — used for credential encryption, session management, and webhook verification when purpose-specific overrides are not set
JWT_PRIVATE_KEY: RSA private key for signing terminal access tokens
JWT_PUBLIC_KEY: RSA public key for token verification

Step 3: Configure Environment Variables (Local Development)

Note: For production deployment via GitHub Actions, use GitHub Environment Configuration instead. This step is only needed for local development.

Naming Convention: Local .env files use GITHUB_* prefix (e.g., GITHUB_CLIENT_ID) because that's what the Worker code reads. This differs from GitHub Environment secrets which use GH_* prefix. The deployment workflow maps between them.

Create your .env file:

cp .env.example .env

Edit .env with your values:

# Cloudflare Configuration
CF_API_TOKEN=your-cloudflare-api-token-from-step-4
CF_ZONE_ID=your-zone-id-from-step-3
CF_ACCOUNT_ID=your-account-id-from-step-3

# Domain Configuration
# Use your workspace subdomain (workspaces will be ws-xxx.workspaces.example.com)
BASE_DOMAIN=workspaces.example.com

# GitHub App (from GitHub App setup)
GITHUB_CLIENT_ID=Iv1.xxxxxxxxxxxx
GITHUB_CLIENT_SECRET=your-github-app-client-secret
GITHUB_APP_ID=123456
# For the private key, base64 encode the entire .pem file:
# cat your-key.pem | base64 -w0
GITHUB_APP_PRIVATE_KEY=LS0tLS1CRUdJTi4uLi4=

# Security Keys (from generate-keys.ts script)
ENCRYPTION_KEY=your-encryption-key-from-generate-keys
JWT_PRIVATE_KEY=your-jwt-private-key
JWT_PUBLIC_KEY=your-jwt-public-key

Step 4: Update wrangler.toml

Note: If using the Quick Start (Automated Deployment), skip this step. The sync-wrangler-config.ts script automatically updates wrangler.toml with Pulumi-provisioned resource IDs.

Manual configuration (for local development or manual deployment)

Edit apps/api/wrangler.toml with your resource IDs:

name = "workspaces-api"
main = "src/index.ts"
compatibility_date = "2024-01-01"
compatibility_flags = ["nodejs_compat"]

[vars]
BASE_DOMAIN = "workspaces.example.com"  # Your domain
VERSION = "1.0.0"

# D1 Database (use your database_id from Step 5)
[[d1_databases]]
binding = "DATABASE"
database_name = "workspaces"
database_id = "your-d1-database-id-here"

# KV Namespace (use your namespace id from Step 5)
[[kv_namespaces]]
binding = "KV"
id = "your-kv-namespace-id-here"

# R2 Bucket
[[r2_buckets]]
binding = "R2"
bucket_name = "workspaces-assets"

# Cron for provisioning timeout checks
[triggers]
crons = ["*/5 * * * *"]

Manual Building & Deployment (Optional)

Most users should skip this section. The Quick Start (Automated Deployment) handles all build, deploy, and configuration steps automatically via GitHub Actions. The manual steps below are only needed for local development, custom deployments, or troubleshooting.

Manual Deployment Steps

Step 1: Build All Packages

# Build TypeScript packages
pnpm build

Step 2: Build VM Agent (Go)

The VM Agent runs on workspace VMs and requires compilation:

cd packages/vm-agent

# Install Go dependencies
go mod download

# Build for Linux (VMs use Linux)
make build-all

This creates binaries in packages/vm-agent/bin/:

vm-agent-linux-amd64
vm-agent-linux-arm64
vm-agent-darwin-amd64 (for local testing)
vm-agent-darwin-arm64 (for local testing)

Step 3: Set Cloudflare Worker Secrets

Secrets must be set separately (not in wrangler.toml):

cd apps/api

# Set each secret (you'll be prompted for the value)
wrangler secret put CF_API_TOKEN
wrangler secret put CF_ACCOUNT_ID
wrangler secret put CF_ZONE_ID
wrangler secret put GITHUB_CLIENT_ID
wrangler secret put GITHUB_CLIENT_SECRET
wrangler secret put GITHUB_APP_ID
wrangler secret put GITHUB_APP_PRIVATE_KEY
wrangler secret put GITHUB_APP_SLUG
wrangler secret put GITHUB_WEBHOOK_SECRET
wrangler secret put ENCRYPTION_KEY
wrangler secret put JWT_PRIVATE_KEY
wrangler secret put JWT_PUBLIC_KEY
wrangler secret put ORIGIN_CA_CERT
wrangler secret put ORIGIN_CA_KEY
wrangler secret put TRIAL_CLAIM_TOKEN_SECRET

# Optional purpose-specific overrides (recommended for production)
# wrangler secret put BETTER_AUTH_SECRET
# wrangler secret put CREDENTIAL_ENCRYPTION_KEY

# Optional task attachment upload support
# wrangler secret put R2_ACCESS_KEY_ID
# wrangler secret put R2_SECRET_ACCESS_KEY

Tip: For multiline values (like private keys), you can pipe them:

cat path/to/github-app-key.pem | wrangler secret put GITHUB_APP_PRIVATE_KEY

Step 4: Run Database Migrations

# Apply migrations to production D1
wrangler d1 migrations apply workspaces --remote

Step 5: Deploy API

cd apps/api
wrangler deploy

Note the deployed URL (e.g., workspaces-api.your-subdomain.workers.dev)

Step 6: Deploy Web UI

cd apps/web
pnpm build
wrangler pages deploy dist --project-name simple-agent-manager

If this is your first Pages deployment, Wrangler will create the project. Note the URL (e.g., simple-agent-manager.pages.dev).

Step 7: Upload VM Agent to R2

cd packages/vm-agent

# Upload each binary
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-amd64 --file bin/vm-agent-linux-amd64 --remote
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-arm64 --file bin/vm-agent-linux-arm64 --remote

# Upload version info
echo '{"version": "1.0.0", "buildDate": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' > bin/version.json
wrangler r2 object put workspaces-assets/agents/version.json --file bin/version.json --remote

Manual deployment note: The automated Pulumi workflow generates and persists ENCRYPTION_KEY, JWT_PRIVATE_KEY, JWT_PUBLIC_KEY, ORIGIN_CA_CERT, ORIGIN_CA_KEY, and TRIAL_CLAIM_TOKEN_SECRET. In the manual flow, you must generate and set those Worker secrets yourself. Use wrangler secret put <NAME> --env production if you deploy the Worker with a Wrangler environment.

DNS Configuration

Note: If using the Quick Start (Automated Deployment), DNS records are created automatically by Pulumi. This section is for manual deployment or reference.

Configure DNS records in Cloudflare to route traffic to your deployments.

Required DNS Records

In Cloudflare Dashboard → your domain → DNS:

Type	Name	Content	Proxy Status
CNAME	`api`	`workspaces-api.your-subdomain.workers.dev`	Proxied (orange)
CNAME	`app`	`simple-agent-manager.pages.dev`	Proxied (orange)
CNAME	`*`	`workspaces-api.your-subdomain.workers.dev`	Proxied (orange)

Notes:

The * (wildcard) record catches workspace subdomains (e.g., ws-abc123.workspaces.example.com)
The wildcard record should target the deployed API Worker hostname, matching the automated Pulumi deployment
All records should be proxied (orange cloud) for SSL and Workers routing
If you configure Worker routes manually, add routes for api.example.com/* and *.example.com/*, plus a more-specific *.vm.example.com/* route with no Worker script so VM-agent backend traffic bypasses the wildcard Worker route.

SSL/TLS Configuration

In Cloudflare Dashboard → your domain → SSL/TLS
Set encryption mode to Full (strict)
Under Edge Certificates, ensure:
- Always Use HTTPS: On
- Automatic HTTPS Rewrites: On

Cloudflare automatically provisions SSL certificates including wildcard (*.workspaces.example.com).

Verification

Test each component to ensure everything works.

Test 1: API Health Check

curl https://api.example.com/health
# Should return: {"status":"healthy","timestamp":"..."}

Test 2: Web UI Access

Open https://app.example.com in your browser. You should see the login page.

Test 3: GitHub OAuth Login

Click "Sign in with GitHub"
Authorize the OAuth application
You should be redirected back and see the dashboard

Test 4: Agent Binary Download

curl -I "https://api.example.com/api/agent/download?os=linux&arch=amd64"
# Should return: HTTP/2 200 with Content-Type: application/octet-stream

Test 5: Create a Workspace (Full E2E)

Add your Hetzner API token in Settings
Install the GitHub App on a test repository
Create a workspace from the dashboard
Wait for provisioning (2-5 minutes)
Connect to the terminal

Maintenance

Viewing Logs

# Stream real-time logs
wrangler tail

# Filter to errors only
wrangler tail --format=pretty --filter error

Node Log Configuration

Nodes use systemd journald for centralized log aggregation. The cloud-init template automatically configures journald and Docker logging on new nodes.

Journald configuration (applied via /etc/systemd/journald.conf.d/sam.conf):

Setting	Default	Description
`SystemMaxUse`	`500M`	Max disk space for journal
`SystemKeepFree`	`1G`	Minimum free disk to maintain
`MaxRetentionSec`	`7day`	Max log retention period
`Storage`	`persistent`	Persist logs across reboots
`Compress`	`yes`	Compress stored journal entries

These defaults can be overridden per-node by passing logJournalMaxUse, logJournalKeepFree, and logJournalMaxRetention to the cloud-init generator.

Docker logging: Docker is configured to use the journald log driver, so all container stdout/stderr flows into the same journal. This enables unified log viewing from the control plane UI.

VM Agent environment variables:

Variable	Default	Description
`LOG_LEVEL`	`info`	Agent log level (`debug`, `info`, `warn`, `error`)
`LOG_FORMAT`	`json`	Log output format (`json` or `text`)
`LOG_RETRIEVAL_DEFAULT_LIMIT`	`200`	Default entries per log page
`LOG_RETRIEVAL_MAX_LIMIT`	`1000`	Maximum entries per log page
`LOG_STREAM_BUFFER_SIZE`	`100`	Catch-up entries sent on stream connect
`LOG_READER_TIMEOUT`	`30s`	Timeout for journalctl read commands
`LOG_STREAM_PING_INTERVAL`	`30s`	WebSocket ping interval for log stream
`LOG_STREAM_PONG_TIMEOUT`	`90s`	WebSocket pong deadline for log stream
`SYSINFO_DOCKER_LIST_TIMEOUT`	`10s`	Timeout for `docker ps` command
`SYSINFO_DOCKER_STATS_TIMEOUT`	`10s`	Timeout for `docker stats` command

Updating the VM Agent

When you make changes to the VM Agent:

cd packages/vm-agent
make build-all

# Re-upload to R2
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-amd64 --file bin/vm-agent-linux-amd64 --remote
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-arm64 --file bin/vm-agent-linux-arm64 --remote

# Update version
echo '{"version": "1.0.1", "buildDate": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' > bin/version.json
wrangler r2 object put workspaces-assets/agents/version.json --file bin/version.json --remote

Database Migrations

SAM uses two D1 databases:

DATABASE (workspaces): Core platform data (users, nodes, workspaces, projects, tasks)
OBSERVABILITY_DATABASE (observability): Error storage for the admin observability dashboard (spec 023). Isolated from the main database to prevent error volume from affecting core queries.

When schema changes are needed:

# Create a new migration for the main database
wrangler d1 migrations create workspaces your-migration-name

# Create a new migration for the observability database
wrangler d1 migrations create observability your-migration-name

# Apply all migrations to production (run-migrations.ts handles both databases)
pnpm tsx scripts/deploy/run-migrations.ts --env production

# Or apply individually
wrangler d1 migrations apply workspaces --remote
wrangler d1 migrations apply observability --remote

Note: Durable Object (DO) SQLite migrations are managed automatically. Each project's DO runs pending migrations in its constructor via blockConcurrencyWhile(). No manual migration step is needed for DO schemas.

Durable Object Configuration

SAM uses a per-project Durable Object (PROJECT_DATA) for chat sessions, messages, activity events, and real-time WebSocket streaming. This is configured automatically by Pulumi during deployment.

For manual deployments, ensure your wrangler.toml includes the DO binding:

[[durable_objects.bindings]]
name = "PROJECT_DATA"
class_name = "ProjectData"

[[migrations]]
tag = "v1"
new_sqlite_classes = ["ProjectData"]

Configurable DO limits (set as Worker vars or environment variables):

Variable	Description	Default
`MAX_SESSIONS_PER_PROJECT`	Max chat sessions per project	`10000`
`MAX_MESSAGES_PER_SESSION`	Max messages per chat session	`10000`
`MESSAGE_SIZE_THRESHOLD`	Max message size in bytes	`102400`
`ACTIVITY_RETENTION_DAYS`	Days to retain activity events	`90`
`SESSION_IDLE_TIMEOUT_MINUTES`	Idle session timeout	`60`
`DO_SUMMARY_SYNC_DEBOUNCE_MS`	Debounce for DO-to-D1 summary sync	`5000`
`DEFAULT_TASK_AGENT_TYPE`	Agent used for autonomous task execution	`opencode`
`WORKSPACE_IDLE_TIMEOUT_MS`	Global default idle timeout before workspace is stopped (overridable per-project)	`7200000` (2h)

See apps/api/.env.example for the full list of configurable variables.

Codex Token Refresh Proxy

SAM includes a centralized token refresh proxy for OpenAI Codex OAuth tokens. Codex uses rotating refresh tokens — when one instance refreshes, the old refresh token is permanently invalidated. If two workspaces refresh concurrently, one breaks permanently.

The proxy intercepts Codex refresh requests and serializes them per user via a Durable Object, preventing the race condition. This is enabled by default and requires no additional configuration.

Configurable variables:

Variable	Description	Default
`CODEX_REFRESH_PROXY_ENABLED`	Kill switch — set to `"false"` to disable	Enabled
`CODEX_REFRESH_LOCK_TIMEOUT_MS`	Per-user lock timeout	`30000` (30s)
`CODEX_REFRESH_UPSTREAM_URL`	OpenAI token endpoint	`https://auth.openai.com/oauth/token`
`CODEX_REFRESH_UPSTREAM_TIMEOUT_MS`	Upstream request timeout	`10000` (10s)
`CODEX_CLIENT_ID`	OpenAI OAuth client ID	`app_EMoamEEZ73f0CkXaXp7hrann`
`RATE_LIMIT_CODEX_REFRESH_PER_HOUR`	Max refresh requests per hour per user (enforced atomically via CodexRefreshLock DO ctx.storage)	`30`
`RATE_LIMIT_CODEX_REFRESH_WINDOW_SECONDS`	Rate limit window in seconds	`3600` (1 hour)
`CODEX_EXPECTED_SCOPES`	Comma-separated allowlist of OAuth scopes the upstream may return. Unexpected scopes block the refresh with 502 and the previous token remains valid. Empty string disables validation.	`openid,profile,email,offline_access`

Trial Onboarding

If you want to expose the zero-friction /try URL-to-workspace flow on your deployment, see trial-configuration.md for the required TRIAL_CLAIM_TOKEN_SECRET secret, tunable env vars (monthly cap, workspace TTL, data retention), and the KV-backed kill switch.

Rotating Security Keys

Security keys are managed by Pulumi and normally don't need rotation. If you need to rotate keys:

Option 1: Force Pulumi to recreate keys

# Remove protection from key resources (temporarily)
cd infra
pulumi state unprotect "urn:pulumi:prod::infra::random:index/randomId:RandomId::encryption-key"
pulumi state unprotect "urn:pulumi:prod::infra::tls:index/privateKey:PrivateKey::jwt-signing-key"

# Delete the resources
pulumi state delete "urn:pulumi:prod::infra::random:index/randomId:RandomId::encryption-key"
pulumi state delete "urn:pulumi:prod::infra::tls:index/privateKey:PrivateKey::jwt-signing-key"

# Re-deploy to create new keys
pulumi up

Option 2: Manual rotation

# Generate new keys locally
pnpm tsx scripts/deploy/generate-keys.ts

# Update secrets directly
cd apps/api
wrangler secret put JWT_PRIVATE_KEY
wrangler secret put JWT_PUBLIC_KEY
wrangler secret put ENCRYPTION_KEY

Warning: Rotating keys will:

Invalidate all active terminal sessions (JWT keys)
Make existing encrypted credentials unreadable (CREDENTIAL_ENCRYPTION_KEY, or ENCRYPTION_KEY if the override is not set) - users will need to re-enter their Hetzner tokens

Troubleshooting

Pulumi & Automated Deployment Issues

"error: failed to decrypt state"

Cause: PULUMI_CONFIG_PASSPHRASE doesn't match the one used when state was created.

Fix:

Use the same passphrase used during initial deployment

If you lost the passphrase, delete the stack in R2 and start fresh:

# In Cloudflare Dashboard → R2 → sam-pulumi-state bucket
# Delete the .pulumi/ folder for your stack

"error: failed to load checkpoint"

Cause: R2 backend connection failed or bucket doesn't exist.

Fix:

Verify the Pulumi state bucket exists in Cloudflare R2
Check R2 credentials (R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY) in your GitHub Environment
Verify the bucket name matches the PULUMI_STATE_BUCKET environment variable (default: sam-pulumi-state)

"error: stack 'prod' not found"

Cause: First deployment or stack was removed.

Fix: This is normal for first deployments. The workflow automatically creates the stack. If you see this after a previous deployment, the state may have been deleted.

"error: resource already exists"

Cause: Resource was created outside Pulumi or imported incorrectly.

Fix:

If the resource should be managed by Pulumi, import it:

pulumi import cloudflare:index/d1Database:D1Database sam-database <database-id>

Or delete the resource in Cloudflare Dashboard and re-run deployment

"You need a workers.dev subdomain in order to proceed"

Cause: Cron triggers (used for provisioning timeout checks) require the account's workers.dev subdomain to be initialized. The deploy workflow handles this automatically via the Cloudflare API, but it may fail if the API token lacks the Workers Scripts permission.

Fix:

Automatic: The deployment workflow includes an "Ensure workers.dev Subdomain" step that initializes it. Verify your API token has Account: Workers Scripts (Edit) permission.
Manual: Go to Cloudflare Dashboard → Workers & Pages → click on any worker → Settings → Domains & Routes → enable the workers.dev route.

"Unable to authenticate request" on Pages deploy

Cause: wrangler pages deploy needs the account ID but doesn't read it from wrangler.toml.

Fix: Ensure CF_ACCOUNT_ID is set as a secret in your GitHub Environment. The deploy workflow passes it as CLOUDFLARE_ACCOUNT_ID to the Pages deploy step.

"Deployment succeeded but health check failed"

Cause: Worker deployed but configuration issue preventing startup.

Fix:

Check worker logs: wrangler tail
Verify all secrets are set correctly
Check D1 migrations were applied

Admin Logs show "Cloudflare Observability API returned 403"

Cause: The CF_API_TOKEN is missing the "Workers Observability (Read)" permission, which is required for the admin log viewer.

Fix:

Go to Cloudflare Dashboard → My Profile → API Tokens
Edit the token used for SAM
Add permission: Account → Workers Observability → Read
Save the token
If the token was regenerated, update the CF_API_TOKEN secret in your GitHub Environment and redeploy

"Configure AI Gateway" fails with 403

Cause: The CF_API_TOKEN is missing the account-level "AI Gateway (Edit)" permission. The deploy workflow configures the account AI Gateway before deploying the Worker.

Fix:

Go to Cloudflare Dashboard → My Profile → API Tokens
Edit the token used for SAM
Add permission: Account → AI Gateway → Edit
Save the token
If the token was regenerated, update the CF_API_TOKEN secret in your GitHub Environment and redeploy

"OAuth callback failed" or BetterAuth "unknown" error

Cause: Callback URL mismatch or incorrect GitHub App settings

Fix:

Check your GitHub App's Callback URL matches exactly: https://api.example.com/api/auth/callback/github
Check your GitHub App's Setup URL is set to: https://api.example.com/api/github/callback
Ensure "Request user authorization (OAuth) during installation" is unchecked — when checked, it disables the Setup URL and causes post-installation redirects to hit BetterAuth, which fails
Ensure HTTPS is used (not HTTP)
Verify the domain in Cloudflare is active

"D1_ERROR: no such table"

Cause: Migrations haven't been applied

Fix:

wrangler d1 migrations apply workspaces --remote

"Failed to download agent binary"

Cause: R2 bucket not configured or binaries not uploaded

Fix:

Verify R2 bucket exists: wrangler r2 bucket list
Re-upload binaries (see Step 7 above)

"Workspace stuck in provisioning"

Cause: VM provisioning failed or agent didn't start

Fix:

Check Hetzner console for VM status
If VM is running, SSH in and check: systemctl status vm-agent
View cloud-init logs: cat /var/log/cloud-init-output.log

"pkcs8 must be PKCS#8 formatted string" / GitHub App installation fails

Cause: The GitHub App private key is stored in an unsupported format. GitHub App keys are generated as PKCS#1 (-----BEGIN RSA PRIVATE KEY-----), and the API automatically converts them to PKCS#8 format at runtime.

Fix:

Ensure the key is stored either as raw PEM or base64-encoded PEM (both work)
For base64 encoding: cat your-key.pem | base64 -w0
For raw PEM via wrangler: cat your-key.pem | wrangler secret put GITHUB_APP_PRIVATE_KEY
Make sure the key isn't truncated — PKCS#1 RSA 2048 keys are ~1700 characters

"JWT verification failed"

Cause: Key mismatch between API and expectations

Fix:

Ensure JWT_PUBLIC_KEY and JWT_PRIVATE_KEY are from the same key pair
Check keys aren't truncated (base64 encoding)
Regenerate keys if needed

"DNS_PROBE_FINISHED_NXDOMAIN"

Cause: DNS not propagated or misconfigured

Fix:

Verify nameservers changed at registrar
Check DNS records in Cloudflare dashboard
Wait up to 24 hours for propagation
Test with: dig +short api.example.com

Cost Estimation

Platform Costs (Your Infrastructure)

Component	Free Tier Limit	Paid Overage
Cloudflare Workers	100K requests/day	$0.15/million
Cloudflare D1	5M rows read/day	$0.001/million
Cloudflare KV	100K reads/day	$0.50/million
Cloudflare R2	10GB storage	$0.015/GB/month
Cloudflare Pages	Unlimited	Free

Typical SAM deployment: Stays within free tier for small to medium usage.

User VM Costs (Paid by Users)

Users provide their own Hetzner API token. Workspace VMs are billed to their account:

VM Size	Specs	Hourly	Monthly
Small (CX22)	2 vCPU, 4GB RAM	€0.006 (~$0.007)	€3.79 (~$4.15)
Medium (CX32)	4 vCPU, 8GB RAM	€0.011 (~$0.012)	€6.80 (~$7.50)
Large (CX42)	8 vCPU, 16GB RAM	€0.027 (~$0.030)	€16.40 (~$18)

VMs are billed hourly until they are explicitly stopped or deleted.

Security Considerations

Rotate Keys Regularly: Generate new JWT and encryption keys quarterly
Minimal GitHub App Permissions: Only Contents: Read and write (required for committing) and Metadata: Read-only
No Embedded Secrets: Bootstrap tokens ensure no secrets in cloud-init
HTTPS Only: All traffic is encrypted via Cloudflare
Session Security: BetterAuth handles secure session management

Getting Help

Issues: GitHub Issues
Documentation: docs/
Architecture: Architecture Decision Records

Last updated: 2026-04-14

FilesExpand file tree

self-hosting.md

Latest commit

History

self-hosting.md

File metadata and controls

Self-Hosting Guide

Quick Start (Automated Deployment)

Prerequisites (One-Time Setup)

GitHub Environment Configuration

Deploy

Teardown

Table of Contents

Prerequisites & Preparation

Required Accounts

Required Tools

Preparation Checklist

Cloudflare Setup

Step 1: Add Your Domain to Cloudflare

Step 2: Update Nameservers at Your Registrar

Step 3: Find Your Account ID and Zone ID

Step 4: Create API Token with Required Permissions

Step 5: Create Cloudflare Resources

R2 CORS Configuration (Required for Task Attachments)

GitHub Setup

Step 1: Create GitHub App

Step 2: Generate GitHub App Private Key

Project Setup

Step 1: Clone and Install

Step 2: Generate Security Keys (Local Development Only)

Step 3: Configure Environment Variables (Local Development)

Step 4: Update wrangler.toml

Manual Building & Deployment (Optional)

Step 1: Build All Packages

Step 2: Build VM Agent (Go)

Step 3: Set Cloudflare Worker Secrets

Step 4: Run Database Migrations

Step 5: Deploy API

Step 6: Deploy Web UI

Step 7: Upload VM Agent to R2

DNS Configuration

Required DNS Records

SSL/TLS Configuration

Verification

Test 1: API Health Check

Test 2: Web UI Access

Test 3: GitHub OAuth Login

Test 4: Agent Binary Download

Test 5: Create a Workspace (Full E2E)

Maintenance

Viewing Logs

Node Log Configuration

Updating the VM Agent

Database Migrations

Durable Object Configuration

Codex Token Refresh Proxy

Trial Onboarding

Rotating Security Keys

Troubleshooting

Pulumi & Automated Deployment Issues

"error: failed to decrypt state"

"error: failed to load checkpoint"

"error: stack 'prod' not found"

"error: resource already exists"

"You need a workers.dev subdomain in order to proceed"

"Unable to authenticate request" on Pages deploy

"Deployment succeeded but health check failed"

Admin Logs show "Cloudflare Observability API returned 403"

"Configure AI Gateway" fails with 403

"OAuth callback failed" or BetterAuth "unknown" error

"D1_ERROR: no such table"

"Failed to download agent binary"

"Workspace stuck in provisioning"

"pkcs8 must be PKCS#8 formatted string" / GitHub App installation fails

"JWT verification failed"

"DNS_PROBE_FINISHED_NXDOMAIN"

Cost Estimation

Platform Costs (Your Infrastructure)

User VM Costs (Paid by Users)

Security Considerations