This comprehensive guide walks you through deploying Simple Agent Manager (SAM) to your own infrastructure. Follow each section carefully—skipping steps is the most common cause of deployment issues.
For the fastest deployment experience, use the automated GitHub Actions workflow with Pulumi infrastructure management. Deployment is automatic on every push to main.
For detailed step-by-step instructions, see the Quickstart Guide.
- Fork this repository
- Have a domain on Cloudflare (nameservers already pointed to Cloudflare — see Cloudflare Setup if not yet done)
- Create a Cloudflare API Token — see the detailed permissions table below
- Note your Account ID and Zone ID from the Cloudflare dashboard (domain overview, right sidebar)
- Create an R2 API Token (separate from above - for Pulumi state storage):
- Go to Cloudflare Dashboard → R2 → Manage R2 API Tokens
- Create token with Object Read & Write permissions
- Note: The state bucket is created automatically by the workflow
- Create GitHub App (see GitHub Setup below)
- Generate a Pulumi passphrase for encrypting state:
openssl rand -base64 32
Automated deployment configuration lives in a GitHub Environment named production. This makes deployment inputs visible and editable in the GitHub UI. Runtime Worker vars that are not explicitly passed by the workflow still come from the checked-in top-level [vars] in apps/api/wrangler.toml.
Create the environment:
- Go to your fork's Settings → Environments
- Click New environment
- Name it
productionand click Configure environment
Add environment variables (visible in UI):
| Variable | Description | Example |
|---|---|---|
BASE_DOMAIN |
Your domain for the deployment | example.com |
RESOURCE_PREFIX |
Prefix for Cloudflare resources (optional) | sam |
PULUMI_STATE_BUCKET |
R2 bucket for Pulumi state (optional) | sam-pulumi-state |
Optional feature flags (GitHub Environment variables):
| Variable | Description | Default |
|---|---|---|
REQUIRE_APPROVAL |
Require admin approval for new users. First user becomes superadmin. | (unset — all users active) |
HETZNER_BASE_IMAGE |
Hetzner VM base image. Set to ubuntu-24.04 for emergency rollback from the faster docker-ce marketplace default. |
docker-ce |
Optional runtime-config limit variables (Worker vars):
These are runtime Worker variables, not GitHub Environment variables in the current workflow. To change them for automated deployments, edit the top-level [vars] in apps/api/wrangler.toml before deploying, or extend .github/workflows/deploy-reusable.yml and scripts/deploy/sync-wrangler-config.ts to pass them through. Cloudflare Wrangler environment vars are non-inheritable, so the sync script copies top-level [vars] into the generated [env.production.vars] / [env.staging.vars] sections.
| Variable | Description | Default |
|---|---|---|
MAX_PROJECT_RUNTIME_ENV_VARS_PER_PROJECT |
Max runtime env vars saved per project | 150 |
MAX_PROJECT_RUNTIME_FILES_PER_PROJECT |
Max runtime files saved per project | 50 |
MAX_PROJECT_RUNTIME_ENV_VALUE_BYTES |
Max bytes per runtime env var value | 8192 |
MAX_PROJECT_RUNTIME_FILE_CONTENT_BYTES |
Max bytes per runtime file content | 131072 |
MAX_PROJECT_RUNTIME_FILE_PATH_LENGTH |
Max runtime file path length (chars) | 256 |
Optional AI task title generation variables (Worker vars):
| Variable | Description | Default |
|---|---|---|
TASK_TITLE_MODEL |
Workers AI model for task title generation | @cf/google/gemma-3-12b-it |
TASK_TITLE_MAX_LENGTH |
Max characters in a generated title | 100 |
TASK_TITLE_TIMEOUT_MS |
Timeout (ms) for AI title generation before falling back to truncation | 5000 |
TASK_TITLE_GENERATION_ENABLED |
Set to false to disable AI generation entirely |
true |
TASK_TITLE_SHORT_MESSAGE_THRESHOLD |
Messages at or below this length bypass AI | 100 |
TASK_TITLE_MAX_RETRIES |
Max retry attempts on AI generation failure (rate limit, transient errors) | 2 |
TASK_TITLE_RETRY_DELAY_MS |
Base delay (ms) between retries (exponential backoff: delay × 2^attempt) | 1000 |
TASK_TITLE_RETRY_MAX_DELAY_MS |
Max delay (ms) cap for retry backoff | 4000 |
Add environment secrets (hidden):
| Secret | Description |
|---|---|
CF_API_TOKEN |
Cloudflare API token with D1, KV, R2, DNS, Workers Scripts, Workers Observability, AI Gateway, Workers Routes, Pages, and SSL/Certificates permissions |
CF_ACCOUNT_ID |
Your Cloudflare account ID (32-char hex). Also used as a Worker secret for the admin observability log viewer. |
CF_ZONE_ID |
Your domain's zone ID (32-char hex) |
R2_ACCESS_KEY_ID |
R2 API token access key |
R2_SECRET_ACCESS_KEY |
R2 API token secret key |
PULUMI_CONFIG_PASSPHRASE |
Your generated passphrase |
GH_CLIENT_ID |
GitHub App client ID |
GH_CLIENT_SECRET |
GitHub App client secret |
GH_APP_ID |
GitHub App ID |
GH_APP_PRIVATE_KEY |
GitHub App private key (raw PEM or base64 encoded — both work) |
GH_APP_SLUG |
GitHub App slug (URL name) |
GH_WEBHOOK_SECRET |
GitHub webhook HMAC-SHA256 verification secret. Required when the GitHub App webhook is active; must match the GitHub App webhook secret exactly. The deploy workflow maps this to the Worker secret GITHUB_WEBHOOK_SECRET. |
Optional secrets (TLS — usually not needed):
| Secret | Description |
|---|---|
CF_ORIGIN_CA_KEY |
Deprecated fallback. Cloudflare Origin CA Key — only needed if your CF_API_TOKEN lacks the Zone > SSL and Certificates > Edit permission and you can't update it. The Origin CA Key is deprecated by Cloudflare (removal Sept 2026). Prefer adding the SSL permission to your API token instead. |
Optional secrets (purpose-specific security overrides — recommended for production):
| Secret | Description |
|---|---|
BETTER_AUTH_SECRET |
BetterAuth session signing/encryption (overrides ENCRYPTION_KEY for sessions) |
CREDENTIAL_ENCRYPTION_KEY |
AES-GCM encryption of user cloud credentials (overrides ENCRYPTION_KEY for credential storage) |
Optional secrets (for GCP OIDC integration — see GCP Setup Guide for full instructions):
| Secret | Description |
|---|---|
GOOGLE_CLIENT_ID |
Google Cloud Console OAuth 2.0 client ID (enables "Connect Google Cloud" in Settings) |
GOOGLE_CLIENT_SECRET |
Google Cloud Console OAuth 2.0 client secret |
GCP OAuth Redirect URI: When creating a Google OAuth 2.0 client, add
https://api.<YOUR_BASE_DOMAIN>/api/deployment/gcp/callbackas an authorized redirect URI. This is a single static URI shared by all projects — no per-project URIs needed.
Optional GCP VM provisioning configuration (env vars, not secrets — sensible defaults provided):
| Variable | Default | Description |
|---|---|---|
GCP_STS_SCOPE |
https://www.googleapis.com/auth/cloud-platform |
OAuth scope for STS token exchange |
GCP_SA_IMPERSONATION_SCOPES |
https://www.googleapis.com/auth/compute |
Comma-separated scopes for SA impersonation |
For the full list of GCP configuration variables, see the GCP Setup Guide.
Optional GCP deployment configuration (for project-level Defang deployment — sensible defaults provided):
| Variable | Default | Description |
|---|---|---|
GCP_DEPLOY_WIF_POOL_ID |
sam-deploy-pool |
WIF pool ID for project-level deployment auth |
GCP_DEPLOY_WIF_PROVIDER_ID |
sam-oidc |
OIDC provider within the deploy pool |
GCP_DEPLOY_SERVICE_ACCOUNT_ID |
sam-deployer |
Service account for deployment operations |
GCP_DEPLOY_IDENTITY_TOKEN_EXPIRY_SECONDS |
600 |
Identity token lifetime in seconds |
⚠️ Naming Convention — read this before troubleshooting "missing secret" errors: GitHub App secrets useGH_*prefix (notGITHUB_*) because GitHub Actions secret names cannot start withGITHUB_. The deployment workflow automatically mapsGH_*→GITHUB_*when setting Cloudflare Worker secrets. If you seeGITHUB_CLIENT_IDorGITHUB_WEBHOOK_SECRETin code or.envfiles, those are Worker-side names — useGH_CLIENT_IDandGH_WEBHOOK_SECRETin GitHub Environment secrets. Google OAuth secrets useGOOGLE_*directly.
Note: Security keys (
ENCRYPTION_KEY,JWT_PRIVATE_KEY,JWT_PUBLIC_KEY) and TLS certificates (ORIGIN_CA_CERT,ORIGIN_CA_KEY) are automatically generated and persisted via Pulumi state in R2. No manual intervention required—keys are created on first deployment and reused automatically on subsequent deployments.
Automatic deployment: Every push to main triggers a deployment automatically.
First deployment:
- Configure the GitHub Environment (see above)
- Push any commit to
main, OR - Go to Actions → "Deploy" → "Run workflow" for manual trigger
Subsequent deployments: Just merge PRs to main. The workflow:
- Validates all required configuration exists
- Provisions infrastructure via Pulumi (idempotent)
- Deploys API Worker and Web UI via Wrangler
- Runs database migrations
- Builds and uploads VM Agent binaries
- Runs health check
To remove all resources:
- Go to Actions → "Teardown"
- Click "Run workflow"
- Type
DELETEto confirm - Click "Run workflow"
For more control or troubleshooting, continue with the manual setup below.
- Prerequisites & Preparation
- Cloudflare Setup
- GitHub Setup
- Project Setup
- Manual Building & Deployment (Optional)
- DNS Configuration
- Verification
- Maintenance
- Troubleshooting
- Cost Estimation
Before starting, ensure you have the following ready.
| Account | Purpose | Tier Needed | Sign-up Link |
|---|---|---|---|
| Cloudflare | API hosting, DNS, storage | Free tier | cloudflare.com |
| GitHub | Authentication, repository access | Free tier | github.com |
| Domain Registrar | Your workspace domain | Any | (you likely already have one) |
Note on cloud providers: SAM uses a Bring-Your-Own-Cloud (BYOC) model. Each user provides their own Hetzner (or other provider) API token through the Settings UI to create workspaces. You do not need a shared cloud provider account for the platform itself — Cloudflare is the only infrastructure the platform operator manages.
Install these on your development machine:
# Node.js 20+ (check version)
node --version # Should be v20.x or higher
# pnpm 9+ (install if missing)
npm install -g pnpm
pnpm --version # Should be 9.x or higher
# Go 1.22+ (needed to compile the VM Agent — the binary that runs on each workspace VM)
go version # Should be go1.22.x or higher
# Git
git --versionInstalling Go (if not installed):
- macOS:
brew install go - Ubuntu/Debian:
sudo apt install golang-go(or use official installer) - Windows: Download from go.dev/dl
- All required accounts created
- All tools installed and verified
- A domain you control (e.g.,
example.comorworkspaces.example.com) - 30-60 minutes of uninterrupted time
This section covers setting up Cloudflare as your infrastructure provider.
If your domain is not already on Cloudflare:
- Log in to Cloudflare Dashboard
- Click "Add a Site" (or "Add site" button)
- Enter your domain (e.g.,
example.com) and click Continue - Select the Free plan and click Continue
- Cloudflare will scan your existing DNS records—review and click Continue
- Important: Note the two nameservers Cloudflare assigns (e.g.,
ivy.ns.cloudflare.com,rudy.ns.cloudflare.com)
You must point your domain to Cloudflare's nameservers. This varies by registrar:
GoDaddy:
- Go to my.godaddy.com → My Products → DNS
- Click Nameservers → Change → Enter custom nameservers
- Enter Cloudflare's nameservers, click Save
Namecheap:
- Go to namecheap.com → Domain List → Manage
- Under Nameservers, select Custom DNS
- Enter Cloudflare's nameservers, click Save
Google Domains / Squarespace Domains:
- Go to domains.squarespace.com
- Select your domain → DNS → Nameservers → Use custom nameservers
- Enter Cloudflare's nameservers
Other Registrars: Look for "Nameservers" or "DNS Settings" in your registrar's dashboard.
Important: Nameserver changes can take up to 24 hours to propagate. Cloudflare will email you when the domain is active.
You'll need these IDs for configuration:
- In Cloudflare Dashboard, select your domain
- Scroll down on the Overview page
- In the right sidebar under API, you'll see:
- Zone ID: Copy this (32-character hex string)
- Account ID: Copy this (32-character hex string)
Save these values—you'll need them later.
SAM needs a Cloudflare API token with specific permissions:
- Go to My Profile (top-right icon) → API Tokens
- Click "Create Token"
- Click "Create Custom Token" (not a template)
- Configure the token:
Token name: simple-agent-manager
Permissions — add all of these. Each row maps to a single permission in the Cloudflare UI: select the Scope (Account or Zone), then the Category group, then the specific Permission and Access Level.
| Scope | Category | Permission | Access Level |
|---|---|---|---|
| Account | Developer Platform | D1 | Edit |
| Account | Developer Platform | Workers KV Storage | Edit |
| Account | Developer Platform | Workers R2 Storage | Edit |
| Account | Developer Platform | Workers Scripts | Edit |
| Account | Developer Platform | Workers Observability | Read |
| Account | Developer Platform | Pages | Edit |
| Account | AI | AI Gateway | Edit |
| Zone | Developer Platform | Workers Routes | Edit |
| Zone | SSL & Certificates | SSL and Certificates | Edit |
| Zone | DNS & Zone | DNS | Edit |
| Zone | DNS & Zone | Zone | Read |
Zone Resources: Select Include → Specific zone → your domain
Account Resources: Select Include → Your account name
- Click Continue to summary → Create Token
- Copy the token immediately—it won't be shown again
Note: If using the Quick Start (Automated Deployment), skip this step. Pulumi automatically creates D1, KV, and R2 resources when you push to main.
Manual resource creation (optional)
Open your terminal and run these commands:
# Login to Cloudflare via Wrangler
npx wrangler login
# Create D1 Database
npx wrangler d1 create workspaces
# Note the database_id from the output!
# Create KV Namespace for sessions
npx wrangler kv namespace create sessions
# Note the namespace id from the output!
# Create R2 Bucket for VM Agent binaries and task attachments
npx wrangler r2 bucket create workspaces-assetsQuick Start (Automated Deployment): R2 CORS is configured automatically on every deploy by
scripts/deploy/configure-r2-cors.sh. Skip this section if you are using the automated deployment pipeline.
If you are deploying manually and want to enable file attachments on task submissions, configure CORS on the R2 bucket to allow direct browser uploads via presigned PUT URLs:
# Create a cors-rules.json file:
cat > cors-rules.json << 'CORS'
[
{
"AllowedOrigins": ["https://app.YOUR_DOMAIN"],
"AllowedMethods": ["PUT"],
"AllowedHeaders": ["*"],
"ExposeHeaders": ["ETag"],
"MaxAgeSeconds": 3600
}
]
CORS
# Apply CORS rules to the bucket (via S3-compatible API or Cloudflare Dashboard)
# Dashboard: R2 → workspaces-assets → Settings → CORS PolicyReplace YOUR_DOMAIN with your BASE_DOMAIN value (e.g., https://app.simple-agent-manager.org).
You also need R2 S3-compatible API credentials for presigned URL generation. Create these in the Cloudflare Dashboard under R2 → Manage R2 API Tokens, with Object Read & Write permissions scoped to the workspaces-assets bucket. Set R2_ACCESS_KEY_ID and R2_SECRET_ACCESS_KEY as Worker secrets.
Save these IDs from the command outputs:
- D1 Database ID (e.g.,
abc123...) - KV Namespace ID (e.g.,
def456...)
SAM uses a single GitHub App for both user login (OAuth) and repository access.
- Go to GitHub App Settings
- Click "New GitHub App"
- Fill in the form:
Basic Information:
| Field | Value |
|---|---|
| GitHub App name | Simple Agent Manager |
| Homepage URL | https://app.example.com |
Identifying and authorizing users:
| Field | Value |
|---|---|
| Callback URL | https://api.example.com/api/auth/callback/github |
| Expire user authorization tokens | ✓ Checked |
| Request user authorization (OAuth) during installation | ☐ Unchecked |
| Enable Device Flow | ☐ Unchecked |
Important: "Request user authorization (OAuth) during installation" MUST be unchecked. When checked, it disables the Setup URL and causes the post-installation redirect to hit the OAuth callback, which fails because BetterAuth didn't initiate the flow. Users log in separately via the app's login button.
Important: GitHub App user access tokens use app/user permissions (not OAuth scopes). SAM reads the user's primary email from
GET /user/emails, so the Email addresses user permission must be granted.
Post installation:
| Field | Value |
|---|---|
| Setup URL (optional) | https://api.example.com/api/github/callback |
| Redirect on update | ✓ Checked |
Note: The Setup URL points to the API, not the web UI. The API records the installation in the database and then redirects the user to
https://app.example.com/settings.
Webhook:
| Field | Value |
|---|---|
| Active | ✓ Checked |
| Webhook URL | https://api.example.com/api/github/webhook |
| Webhook secret | Generate a random string and save the same value as the GH_WEBHOOK_SECRET GitHub Environment secret |
Permissions:
Repository permissions:
| Permission | Access |
|---|---|
| Contents | Read and write |
| Metadata | Read-only |
Note: Contents requires Read and write access because workspaces need to commit and push code changes back to repositories.
Account permissions:
| Permission | Access |
|---|---|
| Email addresses | Read-only |
Note: SAM uses this permission to read the account's primary email from
GET /user/emails. Without it, SAM falls back to the public profile email fromGET /user, or a GitHub noreply fallback when no email is available.
Where can this GitHub App be installed?: Select based on your needs:
- Only on this account: For personal use
- Any account: For public/team use
- Click "Create GitHub App"
- Note the App ID (number shown at top)
- Copy the Client ID and generate a Client Secret — you'll need both for OAuth login
- On the GitHub App page, scroll to "Private keys"
- Click "Generate a private key"
- A
.pemfile will download automatically - Save this file securely—you'll need it for configuration
# Clone the repository
git clone https://github.com/your-org/simple-agent-manager.git
cd simple-agent-manager
# Install dependencies
pnpm installNote: For production deployments, security keys are automatically managed by Pulumi and persist in R2. This step is only needed for local development.
# Generate JWT and encryption keys for local development
pnpm tsx scripts/deploy/generate-keys.tsThis generates:
- ENCRYPTION_KEY: Shared fallback key — used for credential encryption, session management, and webhook verification when purpose-specific overrides are not set
- JWT_PRIVATE_KEY: RSA private key for signing terminal access tokens
- JWT_PUBLIC_KEY: RSA public key for token verification
Note: For production deployment via GitHub Actions, use GitHub Environment Configuration instead. This step is only needed for local development.
Naming Convention: Local
.envfiles useGITHUB_*prefix (e.g.,GITHUB_CLIENT_ID) because that's what the Worker code reads. This differs from GitHub Environment secrets which useGH_*prefix. The deployment workflow maps between them.
Create your .env file:
cp .env.example .envEdit .env with your values:
# Cloudflare Configuration
CF_API_TOKEN=your-cloudflare-api-token-from-step-4
CF_ZONE_ID=your-zone-id-from-step-3
CF_ACCOUNT_ID=your-account-id-from-step-3
# Domain Configuration
# Use your workspace subdomain (workspaces will be ws-xxx.workspaces.example.com)
BASE_DOMAIN=workspaces.example.com
# GitHub App (from GitHub App setup)
GITHUB_CLIENT_ID=Iv1.xxxxxxxxxxxx
GITHUB_CLIENT_SECRET=your-github-app-client-secret
GITHUB_APP_ID=123456
# For the private key, base64 encode the entire .pem file:
# cat your-key.pem | base64 -w0
GITHUB_APP_PRIVATE_KEY=LS0tLS1CRUdJTi4uLi4=
# Security Keys (from generate-keys.ts script)
ENCRYPTION_KEY=your-encryption-key-from-generate-keys
JWT_PRIVATE_KEY=your-jwt-private-key
JWT_PUBLIC_KEY=your-jwt-public-keyNote: If using the Quick Start (Automated Deployment), skip this step. The
sync-wrangler-config.tsscript automatically updates wrangler.toml with Pulumi-provisioned resource IDs.
Manual configuration (for local development or manual deployment)
Edit apps/api/wrangler.toml with your resource IDs:
name = "workspaces-api"
main = "src/index.ts"
compatibility_date = "2024-01-01"
compatibility_flags = ["nodejs_compat"]
[vars]
BASE_DOMAIN = "workspaces.example.com" # Your domain
VERSION = "1.0.0"
# D1 Database (use your database_id from Step 5)
[[d1_databases]]
binding = "DATABASE"
database_name = "workspaces"
database_id = "your-d1-database-id-here"
# KV Namespace (use your namespace id from Step 5)
[[kv_namespaces]]
binding = "KV"
id = "your-kv-namespace-id-here"
# R2 Bucket
[[r2_buckets]]
binding = "R2"
bucket_name = "workspaces-assets"
# Cron for provisioning timeout checks
[triggers]
crons = ["*/5 * * * *"]Most users should skip this section. The Quick Start (Automated Deployment) handles all build, deploy, and configuration steps automatically via GitHub Actions. The manual steps below are only needed for local development, custom deployments, or troubleshooting.
Manual Deployment Steps
# Build TypeScript packages
pnpm buildThe VM Agent runs on workspace VMs and requires compilation:
cd packages/vm-agent
# Install Go dependencies
go mod download
# Build for Linux (VMs use Linux)
make build-allThis creates binaries in packages/vm-agent/bin/:
vm-agent-linux-amd64vm-agent-linux-arm64vm-agent-darwin-amd64(for local testing)vm-agent-darwin-arm64(for local testing)
Secrets must be set separately (not in wrangler.toml):
cd apps/api
# Set each secret (you'll be prompted for the value)
wrangler secret put CF_API_TOKEN
wrangler secret put CF_ACCOUNT_ID
wrangler secret put CF_ZONE_ID
wrangler secret put GITHUB_CLIENT_ID
wrangler secret put GITHUB_CLIENT_SECRET
wrangler secret put GITHUB_APP_ID
wrangler secret put GITHUB_APP_PRIVATE_KEY
wrangler secret put GITHUB_APP_SLUG
wrangler secret put GITHUB_WEBHOOK_SECRET
wrangler secret put ENCRYPTION_KEY
wrangler secret put JWT_PRIVATE_KEY
wrangler secret put JWT_PUBLIC_KEY
wrangler secret put ORIGIN_CA_CERT
wrangler secret put ORIGIN_CA_KEY
wrangler secret put TRIAL_CLAIM_TOKEN_SECRET
# Optional purpose-specific overrides (recommended for production)
# wrangler secret put BETTER_AUTH_SECRET
# wrangler secret put CREDENTIAL_ENCRYPTION_KEY
# Optional task attachment upload support
# wrangler secret put R2_ACCESS_KEY_ID
# wrangler secret put R2_SECRET_ACCESS_KEYTip: For multiline values (like private keys), you can pipe them:
cat path/to/github-app-key.pem | wrangler secret put GITHUB_APP_PRIVATE_KEY# Apply migrations to production D1
wrangler d1 migrations apply workspaces --remotecd apps/api
wrangler deployNote the deployed URL (e.g., workspaces-api.your-subdomain.workers.dev)
cd apps/web
pnpm build
wrangler pages deploy dist --project-name simple-agent-managerIf this is your first Pages deployment, Wrangler will create the project. Note the URL (e.g., simple-agent-manager.pages.dev).
cd packages/vm-agent
# Upload each binary
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-amd64 --file bin/vm-agent-linux-amd64 --remote
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-arm64 --file bin/vm-agent-linux-arm64 --remote
# Upload version info
echo '{"version": "1.0.0", "buildDate": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' > bin/version.json
wrangler r2 object put workspaces-assets/agents/version.json --file bin/version.json --remoteManual deployment note: The automated Pulumi workflow generates and persists
ENCRYPTION_KEY,JWT_PRIVATE_KEY,JWT_PUBLIC_KEY,ORIGIN_CA_CERT,ORIGIN_CA_KEY, andTRIAL_CLAIM_TOKEN_SECRET. In the manual flow, you must generate and set those Worker secrets yourself. Usewrangler secret put <NAME> --env productionif you deploy the Worker with a Wrangler environment.
Note: If using the Quick Start (Automated Deployment), DNS records are created automatically by Pulumi. This section is for manual deployment or reference.
Configure DNS records in Cloudflare to route traffic to your deployments.
In Cloudflare Dashboard → your domain → DNS:
| Type | Name | Content | Proxy Status |
|---|---|---|---|
| CNAME | api |
workspaces-api.your-subdomain.workers.dev |
Proxied (orange) |
| CNAME | app |
simple-agent-manager.pages.dev |
Proxied (orange) |
| CNAME | * |
workspaces-api.your-subdomain.workers.dev |
Proxied (orange) |
Notes:
- The
*(wildcard) record catches workspace subdomains (e.g.,ws-abc123.workspaces.example.com) - The wildcard record should target the deployed API Worker hostname, matching the automated Pulumi deployment
- All records should be proxied (orange cloud) for SSL and Workers routing
- If you configure Worker routes manually, add routes for
api.example.com/*and*.example.com/*, plus a more-specific*.vm.example.com/*route with no Worker script so VM-agent backend traffic bypasses the wildcard Worker route.
- In Cloudflare Dashboard → your domain → SSL/TLS
- Set encryption mode to Full (strict)
- Under Edge Certificates, ensure:
- Always Use HTTPS: On
- Automatic HTTPS Rewrites: On
Cloudflare automatically provisions SSL certificates including wildcard (*.workspaces.example.com).
Test each component to ensure everything works.
curl https://api.example.com/health
# Should return: {"status":"healthy","timestamp":"..."}Open https://app.example.com in your browser. You should see the login page.
- Click "Sign in with GitHub"
- Authorize the OAuth application
- You should be redirected back and see the dashboard
curl -I "https://api.example.com/api/agent/download?os=linux&arch=amd64"
# Should return: HTTP/2 200 with Content-Type: application/octet-stream- Add your Hetzner API token in Settings
- Install the GitHub App on a test repository
- Create a workspace from the dashboard
- Wait for provisioning (2-5 minutes)
- Connect to the terminal
# Stream real-time logs
wrangler tail
# Filter to errors only
wrangler tail --format=pretty --filter errorNodes use systemd journald for centralized log aggregation. The cloud-init template automatically configures journald and Docker logging on new nodes.
Journald configuration (applied via /etc/systemd/journald.conf.d/sam.conf):
| Setting | Default | Description |
|---|---|---|
SystemMaxUse |
500M |
Max disk space for journal |
SystemKeepFree |
1G |
Minimum free disk to maintain |
MaxRetentionSec |
7day |
Max log retention period |
Storage |
persistent |
Persist logs across reboots |
Compress |
yes |
Compress stored journal entries |
These defaults can be overridden per-node by passing logJournalMaxUse, logJournalKeepFree, and logJournalMaxRetention to the cloud-init generator.
Docker logging: Docker is configured to use the journald log driver, so all container stdout/stderr flows into the same journal. This enables unified log viewing from the control plane UI.
VM Agent environment variables:
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL |
info |
Agent log level (debug, info, warn, error) |
LOG_FORMAT |
json |
Log output format (json or text) |
LOG_RETRIEVAL_DEFAULT_LIMIT |
200 |
Default entries per log page |
LOG_RETRIEVAL_MAX_LIMIT |
1000 |
Maximum entries per log page |
LOG_STREAM_BUFFER_SIZE |
100 |
Catch-up entries sent on stream connect |
LOG_READER_TIMEOUT |
30s |
Timeout for journalctl read commands |
LOG_STREAM_PING_INTERVAL |
30s |
WebSocket ping interval for log stream |
LOG_STREAM_PONG_TIMEOUT |
90s |
WebSocket pong deadline for log stream |
SYSINFO_DOCKER_LIST_TIMEOUT |
10s |
Timeout for docker ps command |
SYSINFO_DOCKER_STATS_TIMEOUT |
10s |
Timeout for docker stats command |
When you make changes to the VM Agent:
cd packages/vm-agent
make build-all
# Re-upload to R2
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-amd64 --file bin/vm-agent-linux-amd64 --remote
wrangler r2 object put workspaces-assets/agents/vm-agent-linux-arm64 --file bin/vm-agent-linux-arm64 --remote
# Update version
echo '{"version": "1.0.1", "buildDate": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}' > bin/version.json
wrangler r2 object put workspaces-assets/agents/version.json --file bin/version.json --remoteSAM uses two D1 databases:
- DATABASE (
workspaces): Core platform data (users, nodes, workspaces, projects, tasks) - OBSERVABILITY_DATABASE (
observability): Error storage for the admin observability dashboard (spec 023). Isolated from the main database to prevent error volume from affecting core queries.
When schema changes are needed:
# Create a new migration for the main database
wrangler d1 migrations create workspaces your-migration-name
# Create a new migration for the observability database
wrangler d1 migrations create observability your-migration-name
# Apply all migrations to production (run-migrations.ts handles both databases)
pnpm tsx scripts/deploy/run-migrations.ts --env production
# Or apply individually
wrangler d1 migrations apply workspaces --remote
wrangler d1 migrations apply observability --remoteNote: Durable Object (DO) SQLite migrations are managed automatically. Each project's DO runs pending migrations in its constructor via blockConcurrencyWhile(). No manual migration step is needed for DO schemas.
SAM uses a per-project Durable Object (PROJECT_DATA) for chat sessions, messages, activity events, and real-time WebSocket streaming. This is configured automatically by Pulumi during deployment.
For manual deployments, ensure your wrangler.toml includes the DO binding:
[[durable_objects.bindings]]
name = "PROJECT_DATA"
class_name = "ProjectData"
[[migrations]]
tag = "v1"
new_sqlite_classes = ["ProjectData"]Configurable DO limits (set as Worker vars or environment variables):
| Variable | Description | Default |
|---|---|---|
MAX_SESSIONS_PER_PROJECT |
Max chat sessions per project | 10000 |
MAX_MESSAGES_PER_SESSION |
Max messages per chat session | 10000 |
MESSAGE_SIZE_THRESHOLD |
Max message size in bytes | 102400 |
ACTIVITY_RETENTION_DAYS |
Days to retain activity events | 90 |
SESSION_IDLE_TIMEOUT_MINUTES |
Idle session timeout | 60 |
DO_SUMMARY_SYNC_DEBOUNCE_MS |
Debounce for DO-to-D1 summary sync | 5000 |
DEFAULT_TASK_AGENT_TYPE |
Agent used for autonomous task execution | opencode |
WORKSPACE_IDLE_TIMEOUT_MS |
Global default idle timeout before workspace is stopped (overridable per-project) | 7200000 (2h) |
See apps/api/.env.example for the full list of configurable variables.
SAM includes a centralized token refresh proxy for OpenAI Codex OAuth tokens. Codex uses rotating refresh tokens — when one instance refreshes, the old refresh token is permanently invalidated. If two workspaces refresh concurrently, one breaks permanently.
The proxy intercepts Codex refresh requests and serializes them per user via a Durable Object, preventing the race condition. This is enabled by default and requires no additional configuration.
Configurable variables:
| Variable | Description | Default |
|---|---|---|
CODEX_REFRESH_PROXY_ENABLED |
Kill switch — set to "false" to disable |
Enabled |
CODEX_REFRESH_LOCK_TIMEOUT_MS |
Per-user lock timeout | 30000 (30s) |
CODEX_REFRESH_UPSTREAM_URL |
OpenAI token endpoint | https://auth.openai.com/oauth/token |
CODEX_REFRESH_UPSTREAM_TIMEOUT_MS |
Upstream request timeout | 10000 (10s) |
CODEX_CLIENT_ID |
OpenAI OAuth client ID | app_EMoamEEZ73f0CkXaXp7hrann |
RATE_LIMIT_CODEX_REFRESH_PER_HOUR |
Max refresh requests per hour per user (enforced atomically via CodexRefreshLock DO ctx.storage) | 30 |
RATE_LIMIT_CODEX_REFRESH_WINDOW_SECONDS |
Rate limit window in seconds | 3600 (1 hour) |
CODEX_EXPECTED_SCOPES |
Comma-separated allowlist of OAuth scopes the upstream may return. Unexpected scopes block the refresh with 502 and the previous token remains valid. Empty string disables validation. | openid,profile,email,offline_access |
If you want to expose the zero-friction /try URL-to-workspace flow on your deployment, see trial-configuration.md for the required TRIAL_CLAIM_TOKEN_SECRET secret, tunable env vars (monthly cap, workspace TTL, data retention), and the KV-backed kill switch.
Security keys are managed by Pulumi and normally don't need rotation. If you need to rotate keys:
Option 1: Force Pulumi to recreate keys
# Remove protection from key resources (temporarily)
cd infra
pulumi state unprotect "urn:pulumi:prod::infra::random:index/randomId:RandomId::encryption-key"
pulumi state unprotect "urn:pulumi:prod::infra::tls:index/privateKey:PrivateKey::jwt-signing-key"
# Delete the resources
pulumi state delete "urn:pulumi:prod::infra::random:index/randomId:RandomId::encryption-key"
pulumi state delete "urn:pulumi:prod::infra::tls:index/privateKey:PrivateKey::jwt-signing-key"
# Re-deploy to create new keys
pulumi upOption 2: Manual rotation
# Generate new keys locally
pnpm tsx scripts/deploy/generate-keys.ts
# Update secrets directly
cd apps/api
wrangler secret put JWT_PRIVATE_KEY
wrangler secret put JWT_PUBLIC_KEY
wrangler secret put ENCRYPTION_KEYWarning: Rotating keys will:
- Invalidate all active terminal sessions (JWT keys)
- Make existing encrypted credentials unreadable (
CREDENTIAL_ENCRYPTION_KEY, orENCRYPTION_KEYif the override is not set) - users will need to re-enter their Hetzner tokens
Cause: PULUMI_CONFIG_PASSPHRASE doesn't match the one used when state was created.
Fix:
- Use the same passphrase used during initial deployment
- If you lost the passphrase, delete the stack in R2 and start fresh:
# In Cloudflare Dashboard → R2 → sam-pulumi-state bucket # Delete the .pulumi/ folder for your stack
Cause: R2 backend connection failed or bucket doesn't exist.
Fix:
- Verify the Pulumi state bucket exists in Cloudflare R2
- Check R2 credentials (
R2_ACCESS_KEY_ID,R2_SECRET_ACCESS_KEY) in your GitHub Environment - Verify the bucket name matches the
PULUMI_STATE_BUCKETenvironment variable (default:sam-pulumi-state)
Cause: First deployment or stack was removed.
Fix: This is normal for first deployments. The workflow automatically creates the stack. If you see this after a previous deployment, the state may have been deleted.
Cause: Resource was created outside Pulumi or imported incorrectly.
Fix:
- If the resource should be managed by Pulumi, import it:
pulumi import cloudflare:index/d1Database:D1Database sam-database <database-id>
- Or delete the resource in Cloudflare Dashboard and re-run deployment
Cause: Cron triggers (used for provisioning timeout checks) require the account's workers.dev subdomain to be initialized. The deploy workflow handles this automatically via the Cloudflare API, but it may fail if the API token lacks the Workers Scripts permission.
Fix:
- Automatic: The deployment workflow includes an "Ensure workers.dev Subdomain" step that initializes it. Verify your API token has
Account: Workers Scripts (Edit)permission. - Manual: Go to Cloudflare Dashboard → Workers & Pages → click on any worker → Settings → Domains & Routes → enable the
workers.devroute.
Cause: wrangler pages deploy needs the account ID but doesn't read it from wrangler.toml.
Fix: Ensure CF_ACCOUNT_ID is set as a secret in your GitHub Environment. The deploy workflow passes it as CLOUDFLARE_ACCOUNT_ID to the Pages deploy step.
Cause: Worker deployed but configuration issue preventing startup.
Fix:
- Check worker logs:
wrangler tail - Verify all secrets are set correctly
- Check D1 migrations were applied
Cause: The CF_API_TOKEN is missing the "Workers Observability (Read)" permission, which is required for the admin log viewer.
Fix:
- Go to Cloudflare Dashboard → My Profile → API Tokens
- Edit the token used for SAM
- Add permission: Account → Workers Observability → Read
- Save the token
- If the token was regenerated, update the
CF_API_TOKENsecret in your GitHub Environment and redeploy
Cause: The CF_API_TOKEN is missing the account-level "AI Gateway (Edit)" permission. The deploy workflow configures the account AI Gateway before deploying the Worker.
Fix:
- Go to Cloudflare Dashboard → My Profile → API Tokens
- Edit the token used for SAM
- Add permission: Account → AI Gateway → Edit
- Save the token
- If the token was regenerated, update the
CF_API_TOKENsecret in your GitHub Environment and redeploy
Cause: Callback URL mismatch or incorrect GitHub App settings
Fix:
- Check your GitHub App's Callback URL matches exactly:
https://api.example.com/api/auth/callback/github - Check your GitHub App's Setup URL is set to:
https://api.example.com/api/github/callback - Ensure "Request user authorization (OAuth) during installation" is unchecked — when checked, it disables the Setup URL and causes post-installation redirects to hit BetterAuth, which fails
- Ensure HTTPS is used (not HTTP)
- Verify the domain in Cloudflare is active
Cause: Migrations haven't been applied
Fix:
wrangler d1 migrations apply workspaces --remoteCause: R2 bucket not configured or binaries not uploaded
Fix:
- Verify R2 bucket exists:
wrangler r2 bucket list - Re-upload binaries (see Step 7 above)
Cause: VM provisioning failed or agent didn't start
Fix:
- Check Hetzner console for VM status
- If VM is running, SSH in and check:
systemctl status vm-agent - View cloud-init logs:
cat /var/log/cloud-init-output.log
Cause: The GitHub App private key is stored in an unsupported format. GitHub App keys are generated as PKCS#1 (-----BEGIN RSA PRIVATE KEY-----), and the API automatically converts them to PKCS#8 format at runtime.
Fix:
- Ensure the key is stored either as raw PEM or base64-encoded PEM (both work)
- For base64 encoding:
cat your-key.pem | base64 -w0 - For raw PEM via wrangler:
cat your-key.pem | wrangler secret put GITHUB_APP_PRIVATE_KEY - Make sure the key isn't truncated — PKCS#1 RSA 2048 keys are ~1700 characters
Cause: Key mismatch between API and expectations
Fix:
- Ensure JWT_PUBLIC_KEY and JWT_PRIVATE_KEY are from the same key pair
- Check keys aren't truncated (base64 encoding)
- Regenerate keys if needed
Cause: DNS not propagated or misconfigured
Fix:
- Verify nameservers changed at registrar
- Check DNS records in Cloudflare dashboard
- Wait up to 24 hours for propagation
- Test with:
dig +short api.example.com
| Component | Free Tier Limit | Paid Overage |
|---|---|---|
| Cloudflare Workers | 100K requests/day | $0.15/million |
| Cloudflare D1 | 5M rows read/day | $0.001/million |
| Cloudflare KV | 100K reads/day | $0.50/million |
| Cloudflare R2 | 10GB storage | $0.015/GB/month |
| Cloudflare Pages | Unlimited | Free |
Typical SAM deployment: Stays within free tier for small to medium usage.
Users provide their own Hetzner API token. Workspace VMs are billed to their account:
| VM Size | Specs | Hourly | Monthly |
|---|---|---|---|
| Small (CX22) | 2 vCPU, 4GB RAM | €0.006 (~$0.007) | €3.79 (~$4.15) |
| Medium (CX32) | 4 vCPU, 8GB RAM | €0.011 (~$0.012) | €6.80 (~$7.50) |
| Large (CX42) | 8 vCPU, 16GB RAM | €0.027 (~$0.030) | €16.40 (~$18) |
VMs are billed hourly until they are explicitly stopped or deleted.
- Rotate Keys Regularly: Generate new JWT and encryption keys quarterly
- Minimal GitHub App Permissions: Only
Contents: Read and write(required for committing) andMetadata: Read-only - No Embedded Secrets: Bootstrap tokens ensure no secrets in cloud-init
- HTTPS Only: All traffic is encrypted via Cloudflare
- Session Security: BetterAuth handles secure session management
- Issues: GitHub Issues
- Documentation: docs/
- Architecture: Architecture Decision Records
Last updated: 2026-04-14