You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gateway's gRPC auth metadata is currently spread across four hand-maintained
constants in three files:
Constant
File
Purpose
SCOPED_METHODS
auth/authz.rs
Bearer scope per method
ADMIN_METHODS
auth/authz.rs
Admin-vs-user role mapping
UNAUTHENTICATED_METHODS
auth/oidc.rs
Methods that bypass auth entirely
ALLOWED_SANDBOX_METHODS
auth/sandbox_methods.rs
Methods callable by sandbox principals
This has three concrete consequences:
Asymmetric router enforcement. The router rejects Principal::Sandbox
when the method isn't in ALLOWED_SANDBOX_METHODS, but it does not do
the inverse for Principal::User. A Bearer user with openshell:all (or
even just the right scope) is therefore not stopped at the router from
reaching a handler that is intended for sandbox supervisors. Several
handlers (GetSandboxProviderEnvironment, ReportPolicyStatus, SubmitPolicyAnalysis, PushSandboxLogs) rely on ensure_sandbox_scope,
which intentionally lets users through. The streaming RPCs
(ConnectSupervisor, RelayStream) don't gate at all before opening
their bidi channel. End-to-end against an in-cluster Keycloak with an openshell-admin + openshell:all token (full A/B captured below):
Method
main today
Expected
GetSandboxProviderEnvironment
NotFound: sandbox not found (handler reached, store queried)
PermissionDenied at router
ReportPolicyStatus
InvalidArgument: sandbox_id is required (handler validated body)
InvalidArgument: first RelayFrame must be init… (bidi stream opened)
PermissionDenied
IssueSandboxToken, RefreshSandboxToken, and GetInferenceBundle are
already safe today because their handlers call ensure_sandbox_principal_scope, but that safety is per-handler discipline,
not a structural guarantee.
Silent drift on every new RPC. A new RPC that lands in the proto but
doesn't get added to SCOPED_METHODS falls back to requiring openshell:all at runtime — no compile-time error, no test failure unless
someone wrote specific coverage for it. A method missing from ALLOWED_SANDBOX_METHODS silently denies sandbox supervisors. A typo in
one of the hand-written gRPC path strings (e.g. "/openshell.v1.OpenShell/CreateProvider") is undetectable until
production: the proto is the source of truth, but nothing connects the
two.
Auth posture is not visible at the handler. A reviewer reading handle_create_provider has no local signal that the method requires
admin role and provider:write scope — they have to cross-reference
three files.
Checklist
I've reviewed existing issues and the architecture docs.
This is a design proposal, not a "please build this" request.
Proposed Design
Proposed Design
Declare auth metadata at the handler definition site and enforce it at the
router. Replace the four constants with generated tables backed by per-method
annotations; enforce the missing user-side AuthMode check; gate everything
with a compile-time and a descriptor-set-driven test.
Auth model
Auth mode
Principal::Sandbox?
Bearer?
Scope applies?
Role applies?
Examples
unauthenticated
n/a
n/a
no
no
Health, gRPC reflection (handled by prefix, not annotation)
sandbox auth uses the per-sandbox gateway-minted JWT introduced in #1404 —
the old shared sandbox secret no longer exists. A handler annotated sandbox
authenticates as a specific Principal::Sandbox; handlers still perform a
same-sandbox check on the request body where applicable.
Roles are coarse (admin or user). Scopes are fine-grained (sandbox:read, provider:write, etc.).
#[rpc_authz] is a new impl-level attribute macro (first proc macro in the
workspace, in a small openshell-server-macros crate). It inspects each
method's #[rpc_auth] attribute and emits, adjacent to the impl block:
The const name is derived from the trait identifier in the impl
(impl OpenShell for ... → OPEN_SHELL_AUTH_METADATA). Paths are derived
from the service = "..." argument and the snake_case method name converted
to PascalCase, so they cannot drift from the proto.
The macro strips #[rpc_auth(...)] attributes from the methods before
re-emitting the impl block, so #[tonic::async_trait] sees a normal impl.
MethodAuth, AuthMode, and Role live in openshell-server
(auth/method_authz.rs). The macro emits crate::auth::method_authz::*
paths; that only needs to work from inside openshell-server.
Compile-time enforcement
#[rpc_authz] fails compilation when:
An RPC method is missing #[rpc_auth].
An auth = "unauthenticated" or auth = "sandbox" method is annotated
with scope or role.
An auth = "bearer" or auth = "dual" method is missing scope or role.
Two methods on the same service produce the same path.
The same key (auth, scope, or role) appears twice in one #[rpc_auth(...)].
An invalid auth mode or role string is supplied.
Aggregation
The macro emits one pub const per service. Aggregation is a manual one-liner
in a new module auth/method_authz.rs:
This is the single source of truth queried by authz.rs, oidc.rs, and sandbox_methods.rs. No inventory crate, no linker tricks, no runtime
initialization.
AuthGrpcRouter already checks is_sandbox_callable for Principal::Sandbox.
Add the mirror for Principal::User via a new method_authz::is_user_callable(path):
is_user_callable returns true for Bearer / Dual (the only modes a
user principal should reach), false for Sandbox / Unauthenticated,
and true for unknown methods so AuthzPolicy::check still gets to apply
the openshell:all fallback (defense-in-depth, see below).
What this replaces
Today
After
SCOPED_METHODS in auth/authz.rs
method_authz::required_scope() reading from generated tables
ADMIN_METHODS in auth/authz.rs
method_authz::required_role() reading from generated tables
UNAUTHENTICATED_METHODS in auth/oidc.rs
method_authz::is_unauthenticated() reading from generated tables
ALLOWED_SANDBOX_METHODS in auth/sandbox_methods.rs
method_authz::is_sandbox_callable() reading from generated tables
UNAUTHENTICATED_PREFIXES in auth/oidc.rs
Stays — prefix matching for /grpc.reflection.* and /grpc.health.* is structural, not per-method
openshell-core/build.rs is extended to emit a binary FileDescriptorSet
via tonic_build::configure().file_descriptor_set_path(...). The descriptor
is exposed as openshell_core::FILE_DESCRIPTOR_SET (a &'static [u8]).
A test in openshell-server parses the descriptor, enumerates every (service, method) pair, and verifies each one is covered exactly once by
the aggregated MethodAuth tables (or matches one of the prefix-bypassed
paths). Failure modes:
A new RPC is added to a proto but no annotation lands → test fails loudly.
A method appears with two different annotations across services → test fails.
An annotated path doesn't match any real proto RPC → test fails (catches
stale annotations after a rename).
The exhaustiveness test is the primary safety net. The runtime keeps the openshell:all fallback for unknown methods (preserved unknown_method_requires_openshell_all test) as defense in depth: if a
future refactor introduces a code path the test can't see (e.g. a method
routed through the server without appearing in the gateway-facing
descriptor set, or the aggregation list drifts), an unknown method still
requires the all-scope rather than falling open. The two layers are
deliberate and complementary.
Implementation outline
Macro crate, types, annotations. Add crates/openshell-server-macros/
with #[rpc_authz] + #[rpc_auth]. Add MethodAuth, AuthMode, Role,
and the aggregator in auth/method_authz.rs. Annotate every RPC method
on OpenShellService and InferenceService.
Wire lookups. Replace SCOPED_METHODS, ADMIN_METHODS, UNAUTHENTICATED_METHODS, ALLOWED_SANDBOX_METHODS with calls through
the aggregator. Existing unit tests in authz.rs, oidc.rs, sandbox_methods.rs keep exercising the public predicates and continue
to pass.
Router enforcement + exhaustiveness. Add is_user_callable and the Principal::User check in AuthGrpcRouter. Emit the descriptor set
from build.rs. Add the exhaustiveness test plus a router test that
proves openshell-admin + openshell:all is rejected on every sandbox-annotated method.
Backwards compatibility
The four old constants are removed in the same commit that introduces the
aggregator, so external state stays consistent. Behavior changes visible
to deployed gateways:
Six sandbox-only methods (listed in the table above) start rejecting
Bearer users at the router. Before this change, those methods either
succeeded on incomplete requests or surfaced NotFound / InvalidArgument
from the handler. Nothing in the CLI or any user-facing flow calls them;
only sandbox supervisors do. No CLI or e2e regression.
A handful of provider-profile methods and ExecSandboxInteractive that
previously fell back to openshell:all now have explicit scope/role.
Pragmatically: openshell:all tokens still work; provider:read-only
tokens gain access to ListProviderProfiles / GetProviderProfile.
Risks and constraints
First proc macro in the workspace. Adds ~1–2 s of build time for the
macro crate. Mitigated by keeping the macro small and focused on auth
metadata only.
Compiler diagnostics. Proc-macro errors are noisier than const-table
errors; the macro emits compile_error! spans pointing at the offending
method.
Method-name convention. Relies on tonic's snake_case → PascalCase
convention. If a proto introduces a non-conventional method name later,
the macro will need an explicit path override; the current proto surface
doesn't require it.
#[tonic::async_trait] composition. The macro must apply before #[tonic::async_trait], parse the impl body, strip #[rpc_auth]
attributes, and re-emit a clean impl so async_trait's expansion is
unaffected. This is exercised by the full server test suite.
Alternatives Considered
Alternatives Considered
Const tables per module, no proc macro. Improves review locality but
keeps the drift problems intact: paths stay hand-written, missing
registrations still fall back silently, auth mode stays in a separate
file. The proc macro is worth the build-time cost specifically to fix
those two issues.
inventory / linker-tricks distributed registration. Avoids the
manual aggregator one-liner but adds runtime startup work and platform
fragility for marginal benefit. Rejected.
Macro-free runtime registry built from the proto descriptor.
Defer the policy table to a build-time scan of the descriptor with
external YAML attached. Loses the "auth metadata at the call site"
property that motivated this work in the first place. Rejected.
Declarative macro_rules! instead of a proc macro. Can do most of
the work but can't easily generate a canonical service-derived const
name and has weaker diagnostics. Rejected as a worse trade-off.
Found PR feat(auth): per-sandbox authentication to gateway #1404 (merged) added the per-sandbox gateway-minted JWT, Principal::Sandbox, ALLOWED_SANDBOX_METHODS in auth/sandbox_methods.rs,
and the handler-level ensure_sandbox_scope / ensure_sandbox_principal_scope guards. The router was given an is_sandbox_callable check on Principal::Sandbox, but the inverse is_user_callable was not added — that asymmetry is the gap problem ci: add GitHub Actions CI workflow with lint, test, and image build #1
describes.
Found Supervisor ConnectSupervisor and RelayStream RPCs rejected when OIDC is enabled without mTLS #1470 (closed) covers a related streaming-method case
(ConnectSupervisor / RelayStream being rejected when OIDC is
enabled without mTLS) — same RPCs that today accept a bearer user's
bidi stream open. The fix proposed here closes that opening for any
caller without a sandbox principal, regardless of mTLS.
Read crates/openshell-server/src/multiplex.rs to confirm the router
evaluates Principal::Sandbox through is_sandbox_callable but routes Principal::User straight into AuthzPolicy::check, which only knows
about role/scope — not auth mode.
Read crates/openshell-server/src/auth/guard.rs to confirm ensure_sandbox_scope lets Principal::User through unconditionally,
which is what lets GetSandboxProviderEnvironment, ReportPolicyStatus, SubmitPolicyAnalysis, and PushSandboxLogs
reach their handler bodies as a user.
Did not find an open umbrella issue covering per-handler auth metadata
or the router-side Principal::User AuthMode check.
Verified the gap end-to-end against a live local-up-cluster + in-cluster
Keycloak using scripts/test-keycloak-e2e.sh: with an openshell-admin + openshell:all token, 8 of 9 sandbox-only methods
return non-PermissionDenied on main (handler reached; PushSandboxLogs
fully succeeds with {}). The same probes return PermissionDenied: this method requires a sandbox principal on the
proposed branch.
Checklist
I've reviewed existing issues and the architecture docs
This is a design proposal, not a "please build this" request
Problem Statement
Issue draft: Per-Handler gRPC Auth Annotations
Problem Statement
The gateway's gRPC auth metadata is currently spread across four hand-maintained
constants in three files:
SCOPED_METHODSauth/authz.rsADMIN_METHODSauth/authz.rsUNAUTHENTICATED_METHODSauth/oidc.rsALLOWED_SANDBOX_METHODSauth/sandbox_methods.rsThis has three concrete consequences:
Asymmetric router enforcement. The router rejects
Principal::Sandboxwhen the method isn't in
ALLOWED_SANDBOX_METHODS, but it does not dothe inverse for
Principal::User. A Bearer user withopenshell:all(oreven just the right scope) is therefore not stopped at the router from
reaching a handler that is intended for sandbox supervisors. Several
handlers (
GetSandboxProviderEnvironment,ReportPolicyStatus,SubmitPolicyAnalysis,PushSandboxLogs) rely onensure_sandbox_scope,which intentionally lets users through. The streaming RPCs
(
ConnectSupervisor,RelayStream) don't gate at all before openingtheir bidi channel. End-to-end against an in-cluster Keycloak with an
openshell-admin+openshell:alltoken (full A/B captured below):GetSandboxProviderEnvironmentNotFound: sandbox not found(handler reached, store queried)PermissionDeniedat routerReportPolicyStatusInvalidArgument: sandbox_id is required(handler validated body)PermissionDeniedSubmitPolicyAnalysisInvalidArgument: name is requiredPermissionDeniedPushSandboxLogs{}— stream accepted, log push succeededPermissionDeniedConnectSupervisorInvalidArgument: expected SupervisorHello(bidi stream opened)PermissionDeniedRelayStreamInvalidArgument: first RelayFrame must be init…(bidi stream opened)PermissionDeniedIssueSandboxToken,RefreshSandboxToken, andGetInferenceBundlearealready safe today because their handlers call
ensure_sandbox_principal_scope, but that safety is per-handler discipline,not a structural guarantee.
Silent drift on every new RPC. A new RPC that lands in the proto but
doesn't get added to
SCOPED_METHODSfalls back to requiringopenshell:allat runtime — no compile-time error, no test failure unlesssomeone wrote specific coverage for it. A method missing from
ALLOWED_SANDBOX_METHODSsilently denies sandbox supervisors. A typo inone of the hand-written gRPC path strings (e.g.
"/openshell.v1.OpenShell/CreateProvider") is undetectable untilproduction: the proto is the source of truth, but nothing connects the
two.
Auth posture is not visible at the handler. A reviewer reading
handle_create_providerhas no local signal that the method requiresadmin role and
provider:writescope — they have to cross-referencethree files.
Checklist
Proposed Design
Proposed Design
Declare auth metadata at the handler definition site and enforce it at the
router. Replace the four constants with generated tables backed by per-method
annotations; enforce the missing user-side AuthMode check; gate everything
with a compile-time and a descriptor-set-driven test.
Auth model
Principal::Sandbox?unauthenticatedHealth, gRPC reflection (handled by prefix, not annotation)sandboxReportPolicyStatus,PushSandboxLogs,GetInferenceBundlebearerListSandboxes,CreateProvider,SetClusterInferencedualGetSandboxConfig,UpdateConfig,GetDraftPolicysandboxauth uses the per-sandbox gateway-minted JWT introduced in #1404 —the old shared sandbox secret no longer exists. A handler annotated
sandboxauthenticates as a specific
Principal::Sandbox; handlers still perform asame-sandbox check on the request body where applicable.
Roles are coarse (
adminoruser). Scopes are fine-grained (sandbox:read,provider:write, etc.).Per-handler annotation
What the macros generate
#[rpc_authz]is a new impl-level attribute macro (first proc macro in theworkspace, in a small
openshell-server-macroscrate). It inspects eachmethod's
#[rpc_auth]attribute and emits, adjacent to the impl block:The const name is derived from the trait identifier in the impl
(
impl OpenShell for ...→OPEN_SHELL_AUTH_METADATA). Paths are derivedfrom the
service = "..."argument and the snake_case method name convertedto PascalCase, so they cannot drift from the proto.
The macro strips
#[rpc_auth(...)]attributes from the methods beforere-emitting the impl block, so
#[tonic::async_trait]sees a normal impl.MethodAuth,AuthMode, andRolelive inopenshell-server(
auth/method_authz.rs). The macro emitscrate::auth::method_authz::*paths; that only needs to work from inside
openshell-server.Compile-time enforcement
#[rpc_authz]fails compilation when:#[rpc_auth].auth = "unauthenticated"orauth = "sandbox"method is annotatedwith
scopeorrole.auth = "bearer"orauth = "dual"method is missingscopeorrole.auth,scope, orrole) appears twice in one#[rpc_auth(...)].Aggregation
The macro emits one
pub constper service. Aggregation is a manual one-linerin a new module
auth/method_authz.rs:This is the single source of truth queried by
authz.rs,oidc.rs, andsandbox_methods.rs. Noinventorycrate, no linker tricks, no runtimeinitialization.
Router enforcement (closes problem #1)
AuthGrpcRouteralready checksis_sandbox_callableforPrincipal::Sandbox.Add the mirror for
Principal::Uservia a newmethod_authz::is_user_callable(path):is_user_callablereturnstrueforBearer/Dual(the only modes auser principal should reach),
falseforSandbox/Unauthenticated,and
truefor unknown methods soAuthzPolicy::checkstill gets to applythe
openshell:allfallback (defense-in-depth, see below).What this replaces
SCOPED_METHODSinauth/authz.rsmethod_authz::required_scope()reading from generated tablesADMIN_METHODSinauth/authz.rsmethod_authz::required_role()reading from generated tablesUNAUTHENTICATED_METHODSinauth/oidc.rsmethod_authz::is_unauthenticated()reading from generated tablesALLOWED_SANDBOX_METHODSinauth/sandbox_methods.rsmethod_authz::is_sandbox_callable()reading from generated tablesUNAUTHENTICATED_PREFIXESinauth/oidc.rs/grpc.reflection.*and/grpc.health.*is structural, not per-methodExhaustiveness test (closes problem #2)
openshell-core/build.rsis extended to emit a binaryFileDescriptorSetvia
tonic_build::configure().file_descriptor_set_path(...). The descriptoris exposed as
openshell_core::FILE_DESCRIPTOR_SET(a&'static [u8]).A test in
openshell-serverparses the descriptor, enumerates every(service, method)pair, and verifies each one is covered exactly once bythe aggregated
MethodAuthtables (or matches one of the prefix-bypassedpaths). Failure modes:
stale annotations after a rename).
The exhaustiveness test is the primary safety net. The runtime keeps the
openshell:allfallback for unknown methods (preservedunknown_method_requires_openshell_alltest) as defense in depth: if afuture refactor introduces a code path the test can't see (e.g. a method
routed through the server without appearing in the gateway-facing
descriptor set, or the aggregation list drifts), an unknown method still
requires the all-scope rather than falling open. The two layers are
deliberate and complementary.
Implementation outline
crates/openshell-server-macros/with
#[rpc_authz]+#[rpc_auth]. AddMethodAuth,AuthMode,Role,and the aggregator in
auth/method_authz.rs. Annotate every RPC methodon
OpenShellServiceandInferenceService.SCOPED_METHODS,ADMIN_METHODS,UNAUTHENTICATED_METHODS,ALLOWED_SANDBOX_METHODSwith calls throughthe aggregator. Existing unit tests in
authz.rs,oidc.rs,sandbox_methods.rskeep exercising the public predicates and continueto pass.
is_user_callableand thePrincipal::Usercheck inAuthGrpcRouter. Emit the descriptor setfrom
build.rs. Add the exhaustiveness test plus a router test thatproves
openshell-admin+openshell:allis rejected on everysandbox-annotated method.Backwards compatibility
The four old constants are removed in the same commit that introduces the
aggregator, so external state stays consistent. Behavior changes visible
to deployed gateways:
Bearer users at the router. Before this change, those methods either
succeeded on incomplete requests or surfaced
NotFound/InvalidArgumentfrom the handler. Nothing in the CLI or any user-facing flow calls them;
only sandbox supervisors do. No CLI or e2e regression.
ExecSandboxInteractivethatpreviously fell back to
openshell:allnow have explicit scope/role.Pragmatically:
openshell:alltokens still work;provider:read-onlytokens gain access to
ListProviderProfiles/GetProviderProfile.Risks and constraints
macro crate. Mitigated by keeping the macro small and focused on auth
metadata only.
errors; the macro emits
compile_error!spans pointing at the offendingmethod.
convention. If a proto introduces a non-conventional method name later,
the macro will need an explicit path override; the current proto surface
doesn't require it.
#[tonic::async_trait]composition. The macro must apply before#[tonic::async_trait], parse the impl body, strip#[rpc_auth]attributes, and re-emit a clean impl so async_trait's expansion is
unaffected. This is exercised by the full server test suite.
Alternatives Considered
Alternatives Considered
Const tables per module, no proc macro. Improves review locality but
keeps the drift problems intact: paths stay hand-written, missing
registrations still fall back silently, auth mode stays in a separate
file. The proc macro is worth the build-time cost specifically to fix
those two issues.
inventory/ linker-tricks distributed registration. Avoids themanual aggregator one-liner but adds runtime startup work and platform
fragility for marginal benefit. Rejected.
Macro-free runtime registry built from the proto descriptor.
Defer the policy table to a build-time scan of the descriptor with
external YAML attached. Loses the "auth metadata at the call site"
property that motivated this work in the first place. Rejected.
Declarative
macro_rules!instead of a proc macro. Can do most ofthe work but can't easily generate a canonical service-derived const
name and has weaker diagnostics. Rejected as a worse trade-off.
Just add the router check, skip the refactor. Closes problem ci: add GitHub Actions CI workflow with lint, test, and image build #1 in
~20 lines. Doesn't address problems Prototype data access layer that can connect to outlook data #2 and Sandbox logging #3. Considered as a
point-fix; the team chose the structural fix because the asymmetry is
a symptom of the source-of-truth split, not of one missing line.
Agent Investigation
SCOPED_METHODS, sandbox principal, and per-handler auth terminology.SCOPED_METHODS,ADMIN_METHODS,and
UNAUTHENTICATED_METHODSinauth/authz.rsandauth/oidc.rsasflat constants — the hand-maintained shape this proposal replaces.
Principal::Sandbox,ALLOWED_SANDBOX_METHODSinauth/sandbox_methods.rs,and the handler-level
ensure_sandbox_scope/ensure_sandbox_principal_scopeguards. The router was given anis_sandbox_callablecheck onPrincipal::Sandbox, but the inverseis_user_callablewas not added — that asymmetry is the gap problem ci: add GitHub Actions CI workflow with lint, test, and image build #1describes.
other follow-up from the feat(auth): per-sandbox authentication to gateway #1404 review; it's orthogonal to this work
(refresh-state replication, not method-level policy).
(
ConnectSupervisor/RelayStreambeing rejected when OIDC isenabled without mTLS) — same RPCs that today accept a bearer user's
bidi stream open. The fix proposed here closes that opening for any
caller without a sandbox principal, regardless of mTLS.
crates/openshell-server/src/multiplex.rsto confirm the routerevaluates
Principal::Sandboxthroughis_sandbox_callablebut routesPrincipal::Userstraight intoAuthzPolicy::check, which only knowsabout role/scope — not auth mode.
crates/openshell-server/src/auth/guard.rsto confirmensure_sandbox_scopeletsPrincipal::Userthrough unconditionally,which is what lets
GetSandboxProviderEnvironment,ReportPolicyStatus,SubmitPolicyAnalysis, andPushSandboxLogsreach their handler bodies as a user.
or the router-side
Principal::UserAuthMode check.Keycloak using
scripts/test-keycloak-e2e.sh: with anopenshell-admin+openshell:alltoken, 8 of 9 sandbox-only methodsreturn non-
PermissionDeniedonmain(handler reached;PushSandboxLogsfully succeeds with
{}). The same probes returnPermissionDenied: this method requires a sandbox principalon theproposed branch.
Checklist