Generated by Architecture Council - 2026-02-01
Protocol Design Decision: WebSocket
I recommend WebSocket over TLS for the gateway protocol:
Pros:
- Native URLSessionWebSocketTask in iOS 13+
- Bidirectional real-time communication
- Built-in ping/pong for connection health
- Efficient for canvas updates and streaming
- Works through Tailscale without issues
Cons of alternatives:
- gRPC: Additional dependencies (protobuf), complex setup
- REST: Polling overhead, not suitable for real-time canvas
- MQTT: Overkill, designed for IoT pub/sub
App Architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenClaw Companion β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Presentation Layer (SwiftUI) β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
β β ChatView β β CanvasView β β CapabilitySheetView β β
β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Domain Layer β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β
β β ChatManager β βCanvasEngine β βCapabilityController β β
β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Data Layer (OpenClawKit Package) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β GatewayClient (WebSocket) β SessionStore β KeychainMgr ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Concurrency Pattern:
- Swift Concurrency (async/await) throughout
- Actors for shared state (
GatewayActor,CapabilityActor) - AsyncStream for WebSocket messages
- @MainActor for all UI state
Data Flow:
// Message flow
Gateway WebSocket
β AsyncStream<GatewayMessage>
β GatewayClient.messages
β ChatManager/CanvasEngine (Actors)
β @Published state
β SwiftUI ViewsAuthentication & Pairing:
Phase 1: QR Code Pairing
ββββββββββ βββββββββββ βββββββββββ
β Phone ββββββΆβ QR Scan ββββββΆβ Gateway β
ββββββββββ βββββββββββ βββββββββββ
β
βΌ
βββββββββββββββββββ
β { β
β endpoint: "..."β
β fingerprint: "..."β
β challenge: "..."β
β } β
βββββββββββββββββββ
Phase 2: Key Exchange
- Client generates Ed25519 keypair
- Sends public key + device info to gateway
- Gateway signs and returns device token
- Token stored in iOS Keychain (kSecAttrAccessibleAfterFirstUnlock)
Phase 3: Session Establishment
- WebSocket connects with device token in header
- Gateway validates token, returns session ID
- All further messages signed with session key
mTLS Implementation:
// URLSession with client certificate
let config = URLSessionConfiguration.default
config.tlsMinimumSupportedProtocolVersion = .TLSv12
// Client certificate from Keychain
let identity: SecIdentity = loadFromKeychain()
config.urlCredentialStorage?.set(
URLCredential(identity: identity, certificates: nil, persistence: .forSession),
for: protectionSpace
)Capability Token Design:
{
"capability": "camera.front.once",
"granted_at": "2026-02-01T12:00:00Z",
"expires_at": "2026-02-01T12:05:00Z",
"session_id": "abc123",
"nonce": "random_bytes",
"signature": "gateway_signature"
}Sandboxing:
- All gateway data isolated in App Group container
- No cross-app communication except via Gateway
- Background refresh disabled for sensitive capabilities
- Screen recording detection β auto-revoke camera capability
Attention Model Implementation:
βββββββββββββββββββββββββββββββββββββββββββ
β Attention Indicator β
βββββββββββββββββββββββββββββββββββββββββββ€
β PASSIVE β β β β β β No indicator β
β NOTIFY β β β β β β Badge only β
β ACTIVE β β β β β β Subtle glow β
β INTERRUPTβ β β β β β Modal appears β
β URGENT β β β β β β Haptic + sound β
βββββββββββββββββββββββββββββββββββββββββββ
Panic Abort Design:
- Triple-tap anywhere β Immediate session termination
- Shake gesture β Quick capability revoke sheet
- Volume buttons (long press) β Emergency disconnect
- Visual: Red pulsing border when agent has control
Capability Grant Flow (User-First):
ββββββββββββββββββββββββββββββββββββββββββ
β π€ Microphone Access Request β
ββββββββββββββββββββββββββββββββββββββββββ€
β β
β Agent "Claude" is requesting: β
β β’ Microphone access β
β β’ Duration: 60 seconds β
β β’ Purpose: Voice transcription β
β β
β ββββββββββββββββ ββββββββββββββββ β
β β Deny β β Allow 60s β β
β ββββββββββββββββ ββββββββββββββββ β
β β
β [ ] Remember for this session β
β β
β β You can revoke anytime by shaking β
ββββββββββββββββββββββββββββββββββββββββββ
Canvas Interaction Model:
- Tap: Select/activate element
- Long press: Context menu
- Two-finger pan: Navigate canvas
- Pinch: Zoom
- Draw mode: Explicit toggle (pencil icon)
Watch UI:
βββββββββββββββββββββββ
β π’ Claude Active β
β β
β "Checking your β
β calendar..." β
β β
β [π€ Speak] β
β β
β β‘ 2 pending β
βββββββββββββββββββββββ
Gateway Protocol (WebSocket Messages):
// Client β Gateway
interface ClientMessage {
type: "text" | "voice" | "canvas_event" | "capability_response" | "heartbeat";
session_id: string;
timestamp: number;
payload: Record<string, unknown>;
}
// Gateway β Client
interface GatewayMessage {
type: "text" | "canvas_update" | "capability_request" | "attention" | "status";
agent_id: string;
attention_level: "passive" | "notify" | "active" | "interrupt" | "urgent";
payload: Record<string, unknown>;
}
// Canvas Commands
interface CanvasCommand {
action: "clear" | "draw" | "update" | "animate";
elements: CanvasElement[];
transition?: "instant" | "fade" | "slide";
}
interface CanvasElement {
id: string;
type: "text" | "image" | "shape" | "video" | "highlight";
bounds: Rect;
content: string | Data;
style?: Style;
interactive?: boolean;
}Multimodal Input Pipeline:
Voice Input:
Microphone β AVAudioEngine β Speech.framework (local)
β Text + confidence β Gateway
OR (streaming):
Microphone β AVAudioEngine β Opus encode
β WebSocket binary frames β Gateway
Drawing Input:
PencilKit PKDrawing β Rasterize to PNG
β Base64 β Gateway
Camera:
AVCaptureSession β CVPixelBuffer β JPEG compress
β Base64 or separate upload endpoint β Gateway
Canvas Rendering Strategy:
Option A: SwiftUI Canvas (Recommended for MVP)
- Native, no dependencies
- Good for 2D graphics, text, images
- Limited 3D support
Option B: SpriteKit (If heavy animation needed)
- Better for particle effects, complex animations
- More memory overhead
Option C: Metal (Future, not MVP)
- For 3D, shaders, AR integration
Recommendation: SwiftUI Canvas for MVP, with ability to embed SpriteKit scenes for specific animations.
App Store Compliance Analysis:
| Feature | App Store Risk | Mitigation |
|---|---|---|
| AI Chat | β Low | Common pattern (ChatGPT, etc.) |
| Camera Access | β Low | Clear purpose string |
| Background Location | Use "While Using" only | |
| Remote Control | User must initiate all actions | |
| Canvas Scripting | No arbitrary code execution | |
| Sensor Streaming | Clear disclosure in UI | |
| BLE Scanning | π΄ High | Only with explicit use case |
Required Privacy Strings (Info.plist):
<key>NSCameraUsageDescription</key>
<string>Share photos and video with your AI assistant</string>
<key>NSMicrophoneUsageDescription</key>
<string>Send voice messages to your AI assistant</string>
<key>NSLocationWhenInUseUsageDescription</key>
<string>Share your location with your AI assistant when asked</string>
<key>NSBluetoothAlwaysUsageDescription</key>
<string>Connect to nearby devices when requested by your assistant</string>Feature Gating Strategy:
enum DistributionChannel {
case appStore // Conservative
case testFlight // Extended
case enterprise // Full
var allowedCapabilities: Set<Capability> {
switch self {
case .appStore:
return [.camera, .microphone, .locationWhenInUse, .canvas]
case .testFlight:
return appStore.union([.backgroundLocation, .ble, .screenRecording])
case .enterprise:
return Set(Capability.allCases)
}
}
}TestFlight Strategy:
- Submit conservative build to App Store review
- Enable additional features via remote config for TestFlight users
- Monitor for policy violations before wider release
- Document all capabilities clearly in App Review notes
| Question | Decision | Rationale |
|---|---|---|
| Protocol | WebSocket/TLS | Native support, bidirectional, efficient |
| Auth | Ed25519 + JWT-like tokens | Modern, fast, compact |
| Canvas | SwiftUI Canvas | Native, sufficient for MVP |
| Watch Comms | WatchConnectivity | Only Apple-approved method |
| State | Swift Actors | Thread-safe, modern |
| Storage | Keychain + UserDefaults | Secure for tokens, fast for prefs |
- QR code pairing
- WebSocket connection
- Text chat (send/receive)
- Basic canvas (text + images)
- Capability request/grant UI
- Watch status glance
- Panic abort (shake)
- Voice input (PTT)
- Canvas drawing input
- Camera capture
- Multiple agent support
- Watch voice input
- Video playback in canvas
- Streaming voice (duplex)
- BLE device control
- Background location (TestFlight)
- Canvas animations
OpenClawCompanion/
βββ App/
β βββ OpenClawCompanionApp.swift
β βββ Views/
β β βββ Chat/
β β βββ Canvas/
β β βββ Settings/
β β βββ Pairing/
β βββ ViewModels/
β βββ Resources/
βββ Watch/
β βββ OpenClawWatchApp.swift
β βββ Views/
β βββ Complications/
βββ Shared/
β βββ Models/
β βββ Extensions/
βββ Packages/
β βββ OpenClawKit/
β βββ Sources/
β β βββ Networking/
β β βββ Security/
β β βββ Protocol/
β βββ Tests/
βββ project.yml # XcodeGen
βββ README.md
Document version: 1.0 Next review: After MVP implementation