Skip to content

Latest commit

Β 

History

History
410 lines (340 loc) Β· 13.9 KB

File metadata and controls

410 lines (340 loc) Β· 13.9 KB

OpenClaw Companion - Architecture Document

Generated by Architecture Council - 2026-02-01


Council Perspectives

1. iOS Architect πŸ—οΈ

Protocol Design Decision: WebSocket

I recommend WebSocket over TLS for the gateway protocol:

Pros:
- Native URLSessionWebSocketTask in iOS 13+
- Bidirectional real-time communication
- Built-in ping/pong for connection health
- Efficient for canvas updates and streaming
- Works through Tailscale without issues

Cons of alternatives:
- gRPC: Additional dependencies (protobuf), complex setup
- REST: Polling overhead, not suitable for real-time canvas
- MQTT: Overkill, designed for IoT pub/sub

App Architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     OpenClaw Companion                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Presentation Layer (SwiftUI)                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ ChatView    β”‚ β”‚ CanvasView  β”‚ β”‚ CapabilitySheetView β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Domain Layer                                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ ChatManager β”‚ β”‚CanvasEngine β”‚ β”‚CapabilityController β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Data Layer (OpenClawKit Package)                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚ GatewayClient (WebSocket) β”‚ SessionStore β”‚ KeychainMgr β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Concurrency Pattern:

  • Swift Concurrency (async/await) throughout
  • Actors for shared state (GatewayActor, CapabilityActor)
  • AsyncStream for WebSocket messages
  • @MainActor for all UI state

Data Flow:

// Message flow
Gateway WebSocket 
  β†’ AsyncStream<GatewayMessage>
  β†’ GatewayClient.messages
  β†’ ChatManager/CanvasEngine (Actors)
  β†’ @Published state
  β†’ SwiftUI Views

2. Security Architect πŸ”

Authentication & Pairing:

Phase 1: QR Code Pairing
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phone  │────▢│ QR Scan │────▢│ Gateway β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ {               β”‚
            β”‚   endpoint: "..."β”‚
            β”‚   fingerprint: "..."β”‚
            β”‚   challenge: "..."β”‚
            β”‚ }               β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 2: Key Exchange
- Client generates Ed25519 keypair
- Sends public key + device info to gateway
- Gateway signs and returns device token
- Token stored in iOS Keychain (kSecAttrAccessibleAfterFirstUnlock)

Phase 3: Session Establishment
- WebSocket connects with device token in header
- Gateway validates token, returns session ID
- All further messages signed with session key

mTLS Implementation:

// URLSession with client certificate
let config = URLSessionConfiguration.default
config.tlsMinimumSupportedProtocolVersion = .TLSv12

// Client certificate from Keychain
let identity: SecIdentity = loadFromKeychain()
config.urlCredentialStorage?.set(
    URLCredential(identity: identity, certificates: nil, persistence: .forSession),
    for: protectionSpace
)

Capability Token Design:

{
  "capability": "camera.front.once",
  "granted_at": "2026-02-01T12:00:00Z",
  "expires_at": "2026-02-01T12:05:00Z",
  "session_id": "abc123",
  "nonce": "random_bytes",
  "signature": "gateway_signature"
}

Sandboxing:

  • All gateway data isolated in App Group container
  • No cross-app communication except via Gateway
  • Background refresh disabled for sensitive capabilities
  • Screen recording detection β†’ auto-revoke camera capability

3. UX Visionary 🎨

Attention Model Implementation:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Attention Indicator             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  PASSIVE  β”‚  β—‹ β—‹ β—‹ β—‹  β”‚ No indicator   β”‚
β”‚  NOTIFY   β”‚  ● β—‹ β—‹ β—‹  β”‚ Badge only     β”‚
β”‚  ACTIVE   β”‚  ● ● β—‹ β—‹  β”‚ Subtle glow    β”‚
β”‚  INTERRUPTβ”‚  ● ● ● β—‹  β”‚ Modal appears  β”‚
β”‚  URGENT   β”‚  ● ● ● ●  β”‚ Haptic + sound β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Panic Abort Design:

  • Triple-tap anywhere β†’ Immediate session termination
  • Shake gesture β†’ Quick capability revoke sheet
  • Volume buttons (long press) β†’ Emergency disconnect
  • Visual: Red pulsing border when agent has control

Capability Grant Flow (User-First):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  🎀 Microphone Access Request          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                        β”‚
β”‚  Agent "Claude" is requesting:         β”‚
β”‚  β€’ Microphone access                   β”‚
β”‚  β€’ Duration: 60 seconds                β”‚
β”‚  β€’ Purpose: Voice transcription        β”‚
β”‚                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   Deny       β”‚  β”‚  Allow 60s   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                        β”‚
β”‚  [ ] Remember for this session         β”‚
β”‚                                        β”‚
β”‚  β“˜ You can revoke anytime by shaking  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Canvas Interaction Model:

  • Tap: Select/activate element
  • Long press: Context menu
  • Two-finger pan: Navigate canvas
  • Pinch: Zoom
  • Draw mode: Explicit toggle (pencil icon)

Watch UI:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   🟒 Claude Active  β”‚
β”‚                     β”‚
β”‚   "Checking your    β”‚
β”‚    calendar..."     β”‚
β”‚                     β”‚
β”‚   [🎀 Speak]        β”‚
β”‚                     β”‚
β”‚   ⚑ 2 pending      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4. AI/Agent Specialist πŸ€–

Gateway Protocol (WebSocket Messages):

// Client β†’ Gateway
interface ClientMessage {
  type: "text" | "voice" | "canvas_event" | "capability_response" | "heartbeat";
  session_id: string;
  timestamp: number;
  payload: Record<string, unknown>;
}

// Gateway β†’ Client  
interface GatewayMessage {
  type: "text" | "canvas_update" | "capability_request" | "attention" | "status";
  agent_id: string;
  attention_level: "passive" | "notify" | "active" | "interrupt" | "urgent";
  payload: Record<string, unknown>;
}

// Canvas Commands
interface CanvasCommand {
  action: "clear" | "draw" | "update" | "animate";
  elements: CanvasElement[];
  transition?: "instant" | "fade" | "slide";
}

interface CanvasElement {
  id: string;
  type: "text" | "image" | "shape" | "video" | "highlight";
  bounds: Rect;
  content: string | Data;
  style?: Style;
  interactive?: boolean;
}

Multimodal Input Pipeline:

Voice Input:
  Microphone β†’ AVAudioEngine β†’ Speech.framework (local)
    β†’ Text + confidence β†’ Gateway
    
OR (streaming):
  Microphone β†’ AVAudioEngine β†’ Opus encode
    β†’ WebSocket binary frames β†’ Gateway
    
Drawing Input:
  PencilKit PKDrawing β†’ Rasterize to PNG
    β†’ Base64 β†’ Gateway
    
Camera:
  AVCaptureSession β†’ CVPixelBuffer β†’ JPEG compress
    β†’ Base64 or separate upload endpoint β†’ Gateway

Canvas Rendering Strategy:

Option A: SwiftUI Canvas (Recommended for MVP)
- Native, no dependencies
- Good for 2D graphics, text, images
- Limited 3D support

Option B: SpriteKit (If heavy animation needed)
- Better for particle effects, complex animations
- More memory overhead

Option C: Metal (Future, not MVP)
- For 3D, shaders, AR integration

Recommendation: SwiftUI Canvas for MVP, with ability to embed SpriteKit scenes for specific animations.


5. Apple Guidelines Expert πŸ“±

App Store Compliance Analysis:

Feature App Store Risk Mitigation
AI Chat βœ… Low Common pattern (ChatGPT, etc.)
Camera Access βœ… Low Clear purpose string
Background Location ⚠️ Medium Use "While Using" only
Remote Control ⚠️ Medium User must initiate all actions
Canvas Scripting ⚠️ Medium No arbitrary code execution
Sensor Streaming ⚠️ Medium Clear disclosure in UI
BLE Scanning πŸ”΄ High Only with explicit use case

Required Privacy Strings (Info.plist):

<key>NSCameraUsageDescription</key>
<string>Share photos and video with your AI assistant</string>

<key>NSMicrophoneUsageDescription</key>
<string>Send voice messages to your AI assistant</string>

<key>NSLocationWhenInUseUsageDescription</key>
<string>Share your location with your AI assistant when asked</string>

<key>NSBluetoothAlwaysUsageDescription</key>
<string>Connect to nearby devices when requested by your assistant</string>

Feature Gating Strategy:

enum DistributionChannel {
    case appStore      // Conservative
    case testFlight    // Extended
    case enterprise    // Full
    
    var allowedCapabilities: Set<Capability> {
        switch self {
        case .appStore:
            return [.camera, .microphone, .locationWhenInUse, .canvas]
        case .testFlight:
            return appStore.union([.backgroundLocation, .ble, .screenRecording])
        case .enterprise:
            return Set(Capability.allCases)
        }
    }
}

TestFlight Strategy:

  1. Submit conservative build to App Store review
  2. Enable additional features via remote config for TestFlight users
  3. Monitor for policy violations before wider release
  4. Document all capabilities clearly in App Review notes

Architecture Decisions Summary

Question Decision Rationale
Protocol WebSocket/TLS Native support, bidirectional, efficient
Auth Ed25519 + JWT-like tokens Modern, fast, compact
Canvas SwiftUI Canvas Native, sufficient for MVP
Watch Comms WatchConnectivity Only Apple-approved method
State Swift Actors Thread-safe, modern
Storage Keychain + UserDefaults Secure for tokens, fast for prefs

MVP vs Full Feature Set

MVP (v1.0)

  • QR code pairing
  • WebSocket connection
  • Text chat (send/receive)
  • Basic canvas (text + images)
  • Capability request/grant UI
  • Watch status glance
  • Panic abort (shake)

v1.1

  • Voice input (PTT)
  • Canvas drawing input
  • Camera capture
  • Multiple agent support
  • Watch voice input

v2.0

  • Video playback in canvas
  • Streaming voice (duplex)
  • BLE device control
  • Background location (TestFlight)
  • Canvas animations

File Structure

OpenClawCompanion/
β”œβ”€β”€ App/
β”‚   β”œβ”€β”€ OpenClawCompanionApp.swift
β”‚   β”œβ”€β”€ Views/
β”‚   β”‚   β”œβ”€β”€ Chat/
β”‚   β”‚   β”œβ”€β”€ Canvas/
β”‚   β”‚   β”œβ”€β”€ Settings/
β”‚   β”‚   └── Pairing/
β”‚   β”œβ”€β”€ ViewModels/
β”‚   └── Resources/
β”œβ”€β”€ Watch/
β”‚   β”œβ”€β”€ OpenClawWatchApp.swift
β”‚   β”œβ”€β”€ Views/
β”‚   └── Complications/
β”œβ”€β”€ Shared/
β”‚   β”œβ”€β”€ Models/
β”‚   └── Extensions/
β”œβ”€β”€ Packages/
β”‚   └── OpenClawKit/
β”‚       β”œβ”€β”€ Sources/
β”‚       β”‚   β”œβ”€β”€ Networking/
β”‚       β”‚   β”œβ”€β”€ Security/
β”‚       β”‚   └── Protocol/
β”‚       └── Tests/
β”œβ”€β”€ project.yml          # XcodeGen
└── README.md

Document version: 1.0 Next review: After MVP implementation