fix(keycloak): delete wrong-type mapper before recreating audience mapper#359
Open
cwiklik wants to merge 1 commit into
Open
fix(keycloak): delete wrong-type mapper before recreating audience mapper#359cwiklik wants to merge 1 commit into
cwiklik wants to merge 1 commit into
Conversation
6abaa5e to
220b8ce
Compare
…pper (#358) When a protocol mapper exists with the correct name but the wrong type (not oidc-audience-mapper), the POST returns 409 and updateAudienceMapperIfNeeded fails to find a matching mapper — entering an infinite error loop that blocks audience scope propagation. ## Root Cause The 409 Conflict from Keycloak means "a mapper with that name already exists." But updateAudienceMapperIfNeeded only looks for mappers matching BOTH Name == scopeName AND ProtocolMapper == "oidc-audience-mapper". When the existing mapper has the right name but wrong type, the loop skips it and falls through to "no matching audience mapper found." This also prevents verifyAudienceMapper (defense-in-depth from PR #350) from running, since getOrCreateAudienceClientScope returns early on the error — no self-healing is possible. ## Fix In updateAudienceMapperIfNeeded, after failing to find an oidc-audience-mapper, perform a second pass looking for any mapper with a matching name (regardless of type). If found, DELETE it via the Keycloak Admin API, then re-POST the correct oidc-audience-mapper. This is the minimal targeted fix — it handles the exact broken state (wrong-type name collision) without restructuring the flow. ## Observed Symptoms - Operator logs: "ensure audience mapper for existing scope ... no matching audience mapper found" repeating every few seconds - Agent tokens lack the correct audience claim - AuthBridge/Envoy rejects requests with 401 Unauthorized - Affects fresh installs with operator v0.2.0-rc.4 Fixes #358 Signed-off-by: cwiklik <cwiklik@users.noreply.github.com> Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com> Signed-off-by: cwiklik <cwiklikj@gmail.com>
220b8ce to
95b7f28
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
protocolMappertype blocks creation — now deleted and recreated.verifyAudienceMapperhandle it on the next reconcile.Observed Symptoms
Operator logs on Kind cluster with v0.2.0-rc.4, deploying
a2a-currency-converteras a Deployment:{"level":"error","msg":"Keycloak audience scope management failed (credentials will still be written)", "controller":"clientregistration","Deployment":{"name":"a2a-currency-converter","namespace":"team1"}, "clientId":"spiffe://localtest.me/ns/team1/sa/a2a-currency-converter", "error":"ensure audience mapper for existing scope \"agent-team1-a2a-currency-converter-aud\": no matching audience mapper found for scope \"agent-team1-a2a-currency-converter-aud\" (scopeID 2ca02fe1-...)"}This repeats every few seconds indefinitely. The credential Secret IS written (client registration succeeds), but without the audience mapper the issued tokens lack the correct
audclaim → AuthBridge/Envoy rejects them → 401 Unauthorized on all A2A calls.Prior Work and Remaining Gap
updateAudienceMapperIfNeeded— updates audience value when mapper exists with correct type but staleincluded.custom.audienceverifyAudienceMapper— defense-in-depth GET+verify on every reconcile; error propagation (fix for #348)getOrCreateAudienceClientScopereturns error firstThe gap:
updateAudienceMapperIfNeededhas no recovery path when:protocolMappertype (causes name collision 409 but loop skips it)Root Cause Analysis
Call chain (before this fix)
Scenario 1: Wrong-type mapper
A mapper exists in the scope with
Name == "agent-team1-...-aud"butProtocolMapper == "oidc-hardcoded-claim-mapper"(or any non-audience type). This can happen from a prior partial failure or Keycloak migration.updateAudienceMapperIfNeededskips it (line 266: type filter)Scenario 2: Concurrent reconcile / phantom 409
Multiple reconcile goroutines fire simultaneously for the same Deployment (observed: 3-4 reconciles within 1 second on pod restart). One succeeds, others get 409. But by the time the losers list mappers, Keycloak may not have committed the winner's transaction yet → empty list.
Alternatively, Keycloak enforces mapper name uniqueness at the realm level across all client scopes. If the name was ever used in a scope that was later deleted, Keycloak's internal state may still reserve it (observed on Kind: DELETE scope → recreate scope → POST mapper → 409 "Duplicate resource error" despite empty scope).
The Fix
For Scenario 1 (wrong-type mapper):
For Scenario 2 (empty list after 409):
Why returning nil is safe
The
EnsureAudienceScopecall chain has two layers:When
updateAudienceMapperIfNeededreturns nil (scenario 2), control flows:ensureAudienceMapperreturns nilgetOrCreateAudienceClientScopereturns scopeID (success)verifyAudienceMapperruns — it independently GETs the mapper listensureAudienceMapperagain → if 409 repeats, returns nil again → no error logged, mapper will be created on the next full reconcile cycle when the Keycloak state clearsThis is safe because:
verifyAudienceMapperruns every reconcile cycle as defense-in-depthNew helper:
deleteMapperStandard DELETE to
protocol-mappers/models/{mapperID}. Accepts 204 (success) and 404 (already gone) as success.Local Testing Results
Tested on Kind cluster with operator v0.2.0-rc.4 → fix-358:
a2a-currency-converteraudience mapper error repeating every ~2s indefinitelyagent-team1-a2a-currency-converter-audcreated with correctoidc-audience-mapper, audience = SPIFFE URI, registered as realm default scope."Currency Agent" version 1.0.0)Test plan
go test ./internal/keycloak/— 9 tests pass (3 new)Fixes #358
Assisted-By: Claude Code