Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions architecture/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ The storage schema is intentionally narrow:
| `version` | Optional monotonically increasing version for scoped records. |
| `status` | Optional workflow state for records such as policy revisions or draft policy chunks. |
| `dedup_key` and `hit_count` | Optional policy-advisor fields for coalescing repeated observations. |
| `resource_version` | Monotonically increasing counter for optimistic concurrency control. Incremented atomically on each update. |
| `payload` | Prost-encoded protobuf payload for the full domain object. |
| `created_at_ms` and `updated_at_ms` | Gateway timestamps used for ordering and list output. |
| `labels` | JSON object carrying Kubernetes-style object labels for filtering and organization. |
Expand All @@ -99,6 +100,44 @@ scope semantics.
Persisted state includes sandboxes, providers, SSH sessions, policy revisions,
settings, inference configuration, and deployment records.

### Optimistic Concurrency (CAS)

Every object row carries a `resource_version` that the database increments
atomically on each write. Concurrent mutations use compare-and-swap (CAS): the
writer reads the current version, applies changes, and writes back with a
`WHERE resource_version = <expected>` guard. If another writer updated the row
in between, the guard fails and the caller retries with fresh state.

This matters for HA deployments where multiple gateway replicas share the same
Postgres database, and for single-node deployments where concurrent gRPC
handlers or the reconciler mutate the same sandbox.

**When to use CAS** -- any mutation that merges caller-supplied fields into an
existing object:

- Provider credential and config updates (merge maps).
- Sandbox provider attach/detach (append/remove from a list).
- Policy version bumps and draft operations.
- Compute status updates (sandbox phase transitions and reconciliation).

**When CAS is not needed** -- create operations that generate a unique ID
(conflicts are caught by the primary key constraint), unconditional deletes,
and idempotent overwrites where the full payload is self-contained.

The `update_message_cas` helper encapsulates the retry loop: it fetches the
latest object, applies a mutation closure, and attempts the conditional write.
On conflict it re-fetches and retries, up to a bounded limit of 5 attempts.
If the budget is exhausted the persistence layer returns a `Conflict` error,
which gRPC handlers map to `ABORTED` status so clients can retry with current
data.

Settings updates are an exception: they use a Tokio `Mutex` instead of CAS
because settings operations require multi-step validation that is simpler under
an exclusive lock than within a retry loop.

The `resource_version` is surfaced to clients through `ObjectMeta` in proto
responses. Database migrations backfill existing rows with version 1.

Policy and runtime settings are delivered together through the effective sandbox
config path. A gateway-global policy can override sandbox-scoped policy. The
sandbox supervisor polls for config revisions and hot-reloads dynamic policy
Expand Down
14 changes: 14 additions & 0 deletions crates/openshell-cli/src/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2280,6 +2280,11 @@ pub async fn sandbox_get(
println!(" {} {}", "Id:".dimmed(), id);
println!(" {} {}", "Name:".dimmed(), name);
println!(" {} {}", "Phase:".dimmed(), phase_name(sandbox.phase));
println!(
" {} {}",
"Resource version:".dimmed(),
sandbox.metadata.as_ref().map_or(0, |m| m.resource_version)
);

// Display labels if present
if let Some(metadata) = &sandbox.metadata
Expand Down Expand Up @@ -2974,6 +2979,7 @@ async fn auto_create_provider(
name: exact_name.to_string(),
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
r#type: provider_type.to_string(),
credentials: discovered.credentials.clone(),
Expand Down Expand Up @@ -3014,6 +3020,7 @@ async fn auto_create_provider(
name: name.clone(),
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
r#type: provider_type.to_string(),
credentials: discovered.credentials.clone(),
Expand Down Expand Up @@ -3196,6 +3203,7 @@ pub async fn provider_create(
name: name.to_string(),
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
r#type: provider_type.clone(),
credentials: credential_map,
Expand Down Expand Up @@ -3240,6 +3248,11 @@ pub async fn provider_get(server: &str, name: &str, tls: &TlsOptions) -> Result<
println!(" {} {}", "Id:".dimmed(), provider.object_id());
println!(" {} {}", "Name:".dimmed(), provider.object_name());
println!(" {} {}", "Type:".dimmed(), provider.r#type);
println!(
" {} {}",
"Resource version:".dimmed(),
provider.metadata.as_ref().map_or(0, |m| m.resource_version)
);
println!(
" {} {}",
"Credential keys:".dimmed(),
Expand Down Expand Up @@ -3696,6 +3709,7 @@ pub async fn provider_update(
name: name.to_string(),
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
r#type: String::new(),
credentials: credential_map,
Expand Down
2 changes: 2 additions & 0 deletions crates/openshell-cli/tests/ensure_providers_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ impl TestOpenShell {
name: name.to_string(),
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
r#type: provider_type.to_string(),
credentials: HashMap::new(),
Expand Down Expand Up @@ -347,6 +348,7 @@ impl OpenShell for TestOpenShell {
name: provider_metadata.name,
created_at_ms: existing_metadata.created_at_ms,
labels: existing_metadata.labels,
resource_version: 0,
}),
r#type: existing.r#type,
credentials: merge(existing.credentials, provider.credentials),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -445,6 +445,7 @@ impl OpenShell for TestOpenShell {
name: provider_metadata.name,
created_at_ms: existing_metadata.created_at_ms,
labels: existing_metadata.labels,
resource_version: 0,
}),
r#type: existing.r#type,
credentials: merge(existing.credentials, provider.credentials),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ impl OpenShell for TestOpenShell {
name: sandbox_name,
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
phase: SandboxPhase::Provisioning as i32,
..Sandbox::default()
Expand All @@ -140,6 +141,7 @@ impl OpenShell for TestOpenShell {
name,
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
phase: SandboxPhase::Ready as i32,
..Sandbox::default()
Expand Down Expand Up @@ -325,6 +327,7 @@ impl OpenShell for TestOpenShell {
name: sandbox_id.trim_start_matches("id-").to_string(),
created_at_ms: 0,
labels: HashMap::new(),
resource_version: 0,
}),
phase: SandboxPhase::Provisioning as i32,
..Sandbox::default()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ impl OpenShell for TestOpenShell {
name,
created_at_ms: 0,
labels: std::collections::HashMap::new(),
resource_version: 0,
}),
..Default::default()
}),
Expand Down
2 changes: 1 addition & 1 deletion crates/openshell-core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pub mod settings;

pub use config::{ComputeDriverKind, Config, OidcConfig, TlsConfig};
pub use error::{ComputeDriverError, Error, Result};
pub use metadata::{ObjectId, ObjectLabels, ObjectName};
pub use metadata::{GetResourceVersion, ObjectId, ObjectLabels, ObjectName, SetResourceVersion};

/// Build version string derived from git metadata.
///
Expand Down
93 changes: 93 additions & 0 deletions crates/openshell-core/src/metadata.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,16 @@ pub trait ObjectLabels {
fn object_labels(&self) -> Option<HashMap<String, String>>;
}

/// Provides mutable access to set the object's resource version from persistence.
pub trait SetResourceVersion {
fn set_resource_version(&mut self, version: u64);
}

/// Provides read access to the object's current resource version.
pub trait GetResourceVersion {
fn get_resource_version(&self) -> u64;
}

// Implementations for Sandbox
impl ObjectId for Sandbox {
fn object_id(&self) -> &str {
Expand All @@ -44,6 +54,20 @@ impl ObjectLabels for Sandbox {
}
}

impl SetResourceVersion for Sandbox {
fn set_resource_version(&mut self, version: u64) {
if let Some(meta) = self.metadata.as_mut() {
meta.resource_version = version;
}
}
}

impl GetResourceVersion for Sandbox {
fn get_resource_version(&self) -> u64 {
self.metadata.as_ref().map_or(0, |m| m.resource_version)
}
}

// Implementations for Provider
impl ObjectId for Provider {
fn object_id(&self) -> &str {
Expand All @@ -63,6 +87,20 @@ impl ObjectLabels for Provider {
}
}

impl SetResourceVersion for Provider {
fn set_resource_version(&mut self, version: u64) {
if let Some(meta) = self.metadata.as_mut() {
meta.resource_version = version;
}
}
}

impl GetResourceVersion for Provider {
fn get_resource_version(&self) -> u64 {
self.metadata.as_ref().map_or(0, |m| m.resource_version)
}
}

// Implementations for StoredProviderProfile
impl ObjectId for StoredProviderProfile {
fn object_id(&self) -> &str {
Expand All @@ -82,6 +120,20 @@ impl ObjectLabels for StoredProviderProfile {
}
}

impl SetResourceVersion for StoredProviderProfile {
fn set_resource_version(&mut self, version: u64) {
if let Some(meta) = self.metadata.as_mut() {
meta.resource_version = version;
}
}
}

impl GetResourceVersion for StoredProviderProfile {
fn get_resource_version(&self) -> u64 {
self.metadata.as_ref().map_or(0, |m| m.resource_version)
}
}

// Implementations for SshSession
impl ObjectId for SshSession {
fn object_id(&self) -> &str {
Expand All @@ -101,6 +153,20 @@ impl ObjectLabels for SshSession {
}
}

impl SetResourceVersion for SshSession {
fn set_resource_version(&mut self, version: u64) {
if let Some(meta) = self.metadata.as_mut() {
meta.resource_version = version;
}
}
}

impl GetResourceVersion for SshSession {
fn get_resource_version(&self) -> u64 {
self.metadata.as_ref().map_or(0, |m| m.resource_version)
}
}

// Implementations for InferenceRoute
impl ObjectId for InferenceRoute {
fn object_id(&self) -> &str {
Expand All @@ -120,6 +186,20 @@ impl ObjectLabels for InferenceRoute {
}
}

impl SetResourceVersion for InferenceRoute {
fn set_resource_version(&mut self, version: u64) {
if let Some(meta) = self.metadata.as_mut() {
meta.resource_version = version;
}
}
}

impl GetResourceVersion for InferenceRoute {
fn get_resource_version(&self) -> u64 {
self.metadata.as_ref().map_or(0, |m| m.resource_version)
}
}

// Implementations for ObjectForTest (test-only proto type)
impl ObjectId for ObjectForTest {
fn object_id(&self) -> &str {
Expand All @@ -138,3 +218,16 @@ impl ObjectLabels for ObjectForTest {
None
}
}

impl SetResourceVersion for ObjectForTest {
fn set_resource_version(&mut self, _version: u64) {
// ObjectForTest doesn't have metadata, so this is a no-op
}
}

impl GetResourceVersion for ObjectForTest {
fn get_resource_version(&self) -> u64 {
// ObjectForTest doesn't have metadata
0
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-- Add resource_version column for optimistic concurrency control
ALTER TABLE objects ADD COLUMN resource_version BIGINT NOT NULL DEFAULT 1;

-- Backfill existing rows with resource_version = 1
-- (DEFAULT clause handles this automatically for existing rows in PostgreSQL)
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
-- Add resource_version column for optimistic concurrency control
ALTER TABLE objects ADD COLUMN resource_version INTEGER NOT NULL DEFAULT 1;

-- Backfill existing rows with resource_version = 1
-- (DEFAULT clause handles this automatically for existing rows in SQLite)
Loading
Loading