feat(server): declare gRPC auth (mode + scope + role) at the handler, enforce at the router

### Problem Statement

# Issue draft: Per-Handler gRPC Auth Annotations



---

## Problem Statement

The gateway's gRPC auth metadata is currently spread across four hand-maintained
constants in three files:

| Constant | File | Purpose |
|---|---|---|
| `SCOPED_METHODS` | `auth/authz.rs` | Bearer scope per method |
| `ADMIN_METHODS` | `auth/authz.rs` | Admin-vs-user role mapping |
| `UNAUTHENTICATED_METHODS` | `auth/oidc.rs` | Methods that bypass auth entirely |
| `ALLOWED_SANDBOX_METHODS` | `auth/sandbox_methods.rs` | Methods callable by sandbox principals |

This has three concrete consequences:

1. **Asymmetric router enforcement.** The router rejects `Principal::Sandbox`
   when the method isn't in `ALLOWED_SANDBOX_METHODS`, but it does **not** do
   the inverse for `Principal::User`. A Bearer user with `openshell:all` (or
   even just the right scope) is therefore not stopped at the router from
   reaching a handler that is intended for sandbox supervisors. Several
   handlers (`GetSandboxProviderEnvironment`, `ReportPolicyStatus`,
   `SubmitPolicyAnalysis`, `PushSandboxLogs`) rely on `ensure_sandbox_scope`,
   which intentionally lets users through. The streaming RPCs
   (`ConnectSupervisor`, `RelayStream`) don't gate at all before opening
   their bidi channel. End-to-end against an in-cluster Keycloak with an
   `openshell-admin` + `openshell:all` token (full A/B captured below):

   | Method | main today | Expected |
   |---|---|---|
   | `GetSandboxProviderEnvironment` | `NotFound: sandbox not found` (handler reached, store queried) | `PermissionDenied` at router |
   | `ReportPolicyStatus` | `InvalidArgument: sandbox_id is required` (handler validated body) | `PermissionDenied` |
   | `SubmitPolicyAnalysis` | `InvalidArgument: name is required` | `PermissionDenied` |
   | `PushSandboxLogs` | `{}` — **stream accepted, log push succeeded** | `PermissionDenied` |
   | `ConnectSupervisor` | `InvalidArgument: expected SupervisorHello` (bidi stream opened) | `PermissionDenied` |
   | `RelayStream` | `InvalidArgument: first RelayFrame must be init…` (bidi stream opened) | `PermissionDenied` |

   `IssueSandboxToken`, `RefreshSandboxToken`, and `GetInferenceBundle` are
   already safe today because their handlers call
   `ensure_sandbox_principal_scope`, but that safety is per-handler discipline,
   not a structural guarantee.

2. **Silent drift on every new RPC.** A new RPC that lands in the proto but
   doesn't get added to `SCOPED_METHODS` falls back to requiring
   `openshell:all` at runtime — no compile-time error, no test failure unless
   someone wrote specific coverage for it. A method missing from
   `ALLOWED_SANDBOX_METHODS` silently denies sandbox supervisors. A typo in
   one of the hand-written gRPC path strings (e.g.
   `"/openshell.v1.OpenShell/CreateProvider"`) is undetectable until
   production: the proto is the source of truth, but nothing connects the
   two.

3. **Auth posture is not visible at the handler.** A reviewer reading
   `handle_create_provider` has no local signal that the method requires
   admin role and `provider:write` scope — they have to cross-reference
   three files.





## Checklist

- [x] I've reviewed existing issues and the architecture docs.
- [x] This is a design proposal, not a "please build this" request.


### Proposed Design

## Proposed Design

Declare auth metadata at the handler definition site and enforce it at the
router. Replace the four constants with generated tables backed by per-method
annotations; enforce the missing user-side AuthMode check; gate everything
with a compile-time and a descriptor-set-driven test.

### Auth model

| Auth mode | `Principal::Sandbox`? | Bearer? | Scope applies? | Role applies? | Examples |
|---|---|---|---|---|---|
| `unauthenticated` | n/a | n/a | no | no | `Health`, gRPC reflection (handled by prefix, not annotation) |
| `sandbox` | yes | no | no | no | `ReportPolicyStatus`, `PushSandboxLogs`, `GetInferenceBundle` |
| `bearer` | no | yes | yes | yes | `ListSandboxes`, `CreateProvider`, `SetClusterInference` |
| `dual` | yes | yes | yes (Bearer path only) | yes (Bearer path only) | `GetSandboxConfig`, `UpdateConfig`, `GetDraftPolicy` |

`sandbox` auth uses the per-sandbox gateway-minted JWT introduced in #1404 —
the old shared sandbox secret no longer exists. A handler annotated `sandbox`
authenticates as a specific `Principal::Sandbox`; handlers still perform a
same-sandbox check on the request body where applicable.

Roles are coarse (`admin` or `user`). Scopes are fine-grained (`sandbox:read`,
`provider:write`, etc.).

### Per-handler annotation

```rust
#[rpc_authz(service = "openshell.v1.OpenShell")]
#[tonic::async_trait]
impl OpenShell for OpenShellService {
    #[rpc_auth(auth = "unauthenticated")]
    async fn health(...) -> Result<_, Status> { ... }

    #[rpc_auth(auth = "bearer", scope = "sandbox:read", role = "user")]
    async fn list_sandboxes(...) -> Result<_, Status> { ... }

    #[rpc_auth(auth = "bearer", scope = "provider:write", role = "admin")]
    async fn create_provider(...) -> Result<_, Status> { ... }

    #[rpc_auth(auth = "dual", scope = "config:read", role = "user")]
    async fn get_sandbox_config(...) -> Result<_, Status> { ... }

    #[rpc_auth(auth = "sandbox")]
    async fn report_policy_status(...) -> Result<_, Status> { ... }
}
```

### What the macros generate

`#[rpc_authz]` is a new impl-level attribute macro (first proc macro in the
workspace, in a small `openshell-server-macros` crate). It inspects each
method's `#[rpc_auth]` attribute and emits, adjacent to the impl block:

```rust
pub const OPEN_SHELL_AUTH_METADATA: &[MethodAuth] = &[
    MethodAuth {
        path: "/openshell.v1.OpenShell/Health",
        mode: AuthMode::Unauthenticated,
        scope: None,
        role: None,
    },
    MethodAuth {
        path: "/openshell.v1.OpenShell/ListSandboxes",
        mode: AuthMode::Bearer,
        scope: Some("sandbox:read"),
        role: Some(Role::User),
    },
    // ...
];
```

The const name is derived from the trait identifier in the impl
(`impl OpenShell for ...` → `OPEN_SHELL_AUTH_METADATA`). Paths are derived
from the `service = "..."` argument and the snake_case method name converted
to PascalCase, so they cannot drift from the proto.

The macro strips `#[rpc_auth(...)]` attributes from the methods before
re-emitting the impl block, so `#[tonic::async_trait]` sees a normal impl.

`MethodAuth`, `AuthMode`, and `Role` live in `openshell-server`
(`auth/method_authz.rs`). The macro emits `crate::auth::method_authz::*`
paths; that only needs to work from inside `openshell-server`.

### Compile-time enforcement

`#[rpc_authz]` fails compilation when:

- An RPC method is missing `#[rpc_auth]`.
- An `auth = "unauthenticated"` or `auth = "sandbox"` method is annotated
  with `scope` or `role`.
- An `auth = "bearer"` or `auth = "dual"` method is missing `scope` or `role`.
- Two methods on the same service produce the same path.
- The same key (`auth`, `scope`, or `role`) appears twice in one
  `#[rpc_auth(...)]`.
- An invalid auth mode or role string is supplied.

### Aggregation

The macro emits one `pub const` per service. Aggregation is a manual one-liner
in a new module `auth/method_authz.rs`:

```rust
const SERVICES: &[&[MethodAuth]] = &[
    crate::grpc::OPEN_SHELL_AUTH_METADATA,
    crate::inference::INFERENCE_AUTH_METADATA,
];

pub fn lookup(method: &str) -> Option<&'static MethodAuth> {
    SERVICES.iter().flat_map(|s| s.iter()).find(|m| m.path == method)
}
```

This is the single source of truth queried by `authz.rs`, `oidc.rs`, and
`sandbox_methods.rs`. No `inventory` crate, no linker tricks, no runtime
initialization.

### Router enforcement (closes problem #1)

`AuthGrpcRouter` already checks `is_sandbox_callable` for `Principal::Sandbox`.
Add the mirror for `Principal::User` via a new
`method_authz::is_user_callable(path)`:

```rust
Principal::User(ref user) => {
    if !method_authz::is_user_callable(&path) {
        return Ok(status_response(
            tonic::Status::permission_denied(
                "this method requires a sandbox principal")));
    }
    if let Some(policy) = authz_policy {
        if let Err(s) = policy.check(&user.identity, &path) { return ...; }
    }
}
```

`is_user_callable` returns `true` for `Bearer` / `Dual` (the only modes a
user principal should reach), `false` for `Sandbox` / `Unauthenticated`,
and `true` for unknown methods so `AuthzPolicy::check` still gets to apply
the `openshell:all` fallback (defense-in-depth, see below).

### What this replaces

| Today | After |
|---|---|
| `SCOPED_METHODS` in `auth/authz.rs` | `method_authz::required_scope()` reading from generated tables |
| `ADMIN_METHODS` in `auth/authz.rs` | `method_authz::required_role()` reading from generated tables |
| `UNAUTHENTICATED_METHODS` in `auth/oidc.rs` | `method_authz::is_unauthenticated()` reading from generated tables |
| `ALLOWED_SANDBOX_METHODS` in `auth/sandbox_methods.rs` | `method_authz::is_sandbox_callable()` reading from generated tables |
| `UNAUTHENTICATED_PREFIXES` in `auth/oidc.rs` | Stays — prefix matching for `/grpc.reflection.*` and `/grpc.health.*` is structural, not per-method |

### Exhaustiveness test (closes problem #2)

`openshell-core/build.rs` is extended to emit a binary `FileDescriptorSet`
via `tonic_build::configure().file_descriptor_set_path(...)`. The descriptor
is exposed as `openshell_core::FILE_DESCRIPTOR_SET` (a `&'static [u8]`).

A test in `openshell-server` parses the descriptor, enumerates every
`(service, method)` pair, and verifies each one is covered exactly once by
the aggregated `MethodAuth` tables (or matches one of the prefix-bypassed
paths). Failure modes:

- A new RPC is added to a proto but no annotation lands → test fails loudly.
- A method appears with two different annotations across services → test fails.
- An annotated path doesn't match any real proto RPC → test fails (catches
  stale annotations after a rename).

The exhaustiveness test is the *primary* safety net. The runtime keeps the
`openshell:all` fallback for unknown methods (preserved
`unknown_method_requires_openshell_all` test) as defense in depth: if a
future refactor introduces a code path the test can't see (e.g. a method
routed through the server without appearing in the gateway-facing
descriptor set, or the aggregation list drifts), an unknown method still
requires the all-scope rather than falling open. The two layers are
deliberate and complementary.

### Implementation outline

1. **Macro crate, types, annotations.** Add `crates/openshell-server-macros/`
   with `#[rpc_authz]` + `#[rpc_auth]`. Add `MethodAuth`, `AuthMode`, `Role`,
   and the aggregator in `auth/method_authz.rs`. Annotate every RPC method
   on `OpenShellService` and `InferenceService`.
2. **Wire lookups.** Replace `SCOPED_METHODS`, `ADMIN_METHODS`,
   `UNAUTHENTICATED_METHODS`, `ALLOWED_SANDBOX_METHODS` with calls through
   the aggregator. Existing unit tests in `authz.rs`, `oidc.rs`,
   `sandbox_methods.rs` keep exercising the public predicates and continue
   to pass.
3. **Router enforcement + exhaustiveness.** Add `is_user_callable` and the
   `Principal::User` check in `AuthGrpcRouter`. Emit the descriptor set
   from `build.rs`. Add the exhaustiveness test plus a router test that
   proves `openshell-admin` + `openshell:all` is rejected on every
   `sandbox`-annotated method.

### Backwards compatibility

The four old constants are removed in the same commit that introduces the
aggregator, so external state stays consistent. Behavior changes visible
to deployed gateways:

- Six sandbox-only methods (listed in the table above) start rejecting
  Bearer users at the router. Before this change, those methods either
  succeeded on incomplete requests or surfaced `NotFound` / `InvalidArgument`
  from the handler. Nothing in the CLI or any user-facing flow calls them;
  only sandbox supervisors do. No CLI or e2e regression.
- A handful of provider-profile methods and `ExecSandboxInteractive` that
  previously fell back to `openshell:all` now have explicit scope/role.
  Pragmatically: `openshell:all` tokens still work; `provider:read`-only
  tokens gain access to `ListProviderProfiles` / `GetProviderProfile`.

### Risks and constraints

- **First proc macro in the workspace.** Adds ~1–2 s of build time for the
  macro crate. Mitigated by keeping the macro small and focused on auth
  metadata only.
- **Compiler diagnostics.** Proc-macro errors are noisier than const-table
  errors; the macro emits `compile_error!` spans pointing at the offending
  method.
- **Method-name convention.** Relies on tonic's snake_case → PascalCase
  convention. If a proto introduces a non-conventional method name later,
  the macro will need an explicit path override; the current proto surface
  doesn't require it.
- **`#[tonic::async_trait]` composition.** The macro must apply before
  `#[tonic::async_trait]`, parse the impl body, strip `#[rpc_auth]`
  attributes, and re-emit a clean impl so async_trait's expansion is
  unaffected. This is exercised by the full server test suite.

### Alternatives Considered

## Alternatives Considered

1. **Const tables per module, no proc macro.** Improves review locality but
   keeps the drift problems intact: paths stay hand-written, missing
   registrations still fall back silently, auth mode stays in a separate
   file. The proc macro is worth the build-time cost specifically to fix
   those two issues.

2. **`inventory` / linker-tricks distributed registration.** Avoids the
   manual aggregator one-liner but adds runtime startup work and platform
   fragility for marginal benefit. Rejected.

3. **Macro-free runtime registry built from the proto descriptor.**
   Defer the policy table to a build-time scan of the descriptor with
   external YAML attached. Loses the "auth metadata at the call site"
   property that motivated this work in the first place. Rejected.

4. **Declarative `macro_rules!` instead of a proc macro.** Can do most of
   the work but can't easily generate a canonical service-derived const
   name and has weaker diagnostics. Rejected as a worse trade-off.

5. **Just add the router check, skip the refactor.** Closes problem #1 in
   ~20 lines. Doesn't address problems #2 and #3. Considered as a
   point-fix; the team chose the structural fix because the asymmetry is
   a symptom of the source-of-truth split, not of one missing line.

### Agent Investigation


- Searched current and recent issues/PRs for OIDC, RBAC, scope, role,
  `SCOPED_METHODS`, sandbox principal, and per-handler auth terminology.
- Found PR #935 (closed) introduced `SCOPED_METHODS`, `ADMIN_METHODS`,
  and `UNAUTHENTICATED_METHODS` in `auth/authz.rs` and `auth/oidc.rs` as
  flat constants — the hand-maintained shape this proposal replaces.
- Found PR #1404 (merged) added the per-sandbox gateway-minted JWT,
  `Principal::Sandbox`, `ALLOWED_SANDBOX_METHODS` in `auth/sandbox_methods.rs`,
  and the handler-level `ensure_sandbox_scope` /
  `ensure_sandbox_principal_scope` guards. The router was given an
  `is_sandbox_callable` check on `Principal::Sandbox`, but the inverse
  `is_user_callable` was not added — that asymmetry is the gap problem #1
  describes.
- Found #1506 (open) tracks HA-compatible sandbox JWT refresh as the
  other follow-up from the #1404 review; it's orthogonal to this work
  (refresh-state replication, not method-level policy).
- Found #1470 (closed) covers a related streaming-method case
  (`ConnectSupervisor` / `RelayStream` being rejected when OIDC is
  enabled without mTLS) — same RPCs that today accept a bearer user's
  bidi stream open. The fix proposed here closes that opening for any
  caller without a sandbox principal, regardless of mTLS.
- Read `crates/openshell-server/src/multiplex.rs` to confirm the router
  evaluates `Principal::Sandbox` through `is_sandbox_callable` but routes
  `Principal::User` straight into `AuthzPolicy::check`, which only knows
  about role/scope — not auth mode.
- Read `crates/openshell-server/src/auth/guard.rs` to confirm
  `ensure_sandbox_scope` lets `Principal::User` through unconditionally,
  which is what lets `GetSandboxProviderEnvironment`,
  `ReportPolicyStatus`, `SubmitPolicyAnalysis`, and `PushSandboxLogs`
  reach their handler bodies as a user.
- Did not find an open umbrella issue covering per-handler auth metadata
  or the router-side `Principal::User` AuthMode check.
- Verified the gap end-to-end against a live local-up-cluster + in-cluster
  Keycloak using `scripts/test-keycloak-e2e.sh`: with an
  `openshell-admin` + `openshell:all` token, 8 of 9 sandbox-only methods
  return non-`PermissionDenied` on `main` (handler reached; `PushSandboxLogs`
  fully succeeds with `{}`). The same probes return
  `PermissionDenied: this method requires a sandbox principal` on the
  proposed branch.


### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Constant	File	Purpose
`SCOPED_METHODS`	`auth/authz.rs`	Bearer scope per method
`ADMIN_METHODS`	`auth/authz.rs`	Admin-vs-user role mapping
`UNAUTHENTICATED_METHODS`	`auth/oidc.rs`	Methods that bypass auth entirely
`ALLOWED_SANDBOX_METHODS`	`auth/sandbox_methods.rs`	Methods callable by sandbox principals

Method	main today	Expected
`GetSandboxProviderEnvironment`	`NotFound: sandbox not found` (handler reached, store queried)	`PermissionDenied` at router
`ReportPolicyStatus`	`InvalidArgument: sandbox_id is required` (handler validated body)	`PermissionDenied`
`SubmitPolicyAnalysis`	`InvalidArgument: name is required`	`PermissionDenied`
`PushSandboxLogs`	`{}` — stream accepted, log push succeeded	`PermissionDenied`
`ConnectSupervisor`	`InvalidArgument: expected SupervisorHello` (bidi stream opened)	`PermissionDenied`
`RelayStream`	`InvalidArgument: first RelayFrame must be init…` (bidi stream opened)	`PermissionDenied`

Today	After
`SCOPED_METHODS` in `auth/authz.rs`	`method_authz::required_scope()` reading from generated tables
`ADMIN_METHODS` in `auth/authz.rs`	`method_authz::required_role()` reading from generated tables
`UNAUTHENTICATED_METHODS` in `auth/oidc.rs`	`method_authz::is_unauthenticated()` reading from generated tables
`ALLOWED_SANDBOX_METHODS` in `auth/sandbox_methods.rs`	`method_authz::is_sandbox_callable()` reading from generated tables
`UNAUTHENTICATED_PREFIXES` in `auth/oidc.rs`	Stays — prefix matching for `/grpc.reflection.` and `/grpc.health.` is structural, not per-method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): declare gRPC auth (mode + scope + role) at the handler, enforce at the router #1586

Problem Statement

Issue draft: Per-Handler gRPC Auth Annotations

Problem Statement

Checklist

Proposed Design

Proposed Design

Auth model

Per-handler annotation

What the macros generate

Compile-time enforcement

Aggregation

Router enforcement (closes problem #1)

What this replaces

Exhaustiveness test (closes problem #2)

Implementation outline

Backwards compatibility

Risks and constraints

Alternatives Considered

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Auth mode	`Principal::Sandbox`?	Bearer?	Scope applies?	Role applies?	Examples
`unauthenticated`	n/a	n/a	no	no	`Health`, gRPC reflection (handled by prefix, not annotation)
`sandbox`	yes	no	no	no	`ReportPolicyStatus`, `PushSandboxLogs`, `GetInferenceBundle`
`bearer`	no	yes	yes	yes	`ListSandboxes`, `CreateProvider`, `SetClusterInference`
`dual`	yes	yes	yes (Bearer path only)	yes (Bearer path only)	`GetSandboxConfig`, `UpdateConfig`, `GetDraftPolicy`

feat(server): declare gRPC auth (mode + scope + role) at the handler, enforce at the router #1586

Description

Problem Statement

Issue draft: Per-Handler gRPC Auth Annotations

Problem Statement

Checklist

Proposed Design

Proposed Design

Auth model

Per-handler annotation

What the macros generate

Compile-time enforcement

Aggregation

Router enforcement (closes problem #1)

What this replaces

Exhaustiveness test (closes problem #2)

Implementation outline

Backwards compatibility

Risks and constraints

Alternatives Considered

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions