Skip to content

Conversation

@jeet1995
Copy link
Member

@jeet1995 jeet1995 commented Nov 12, 2025

Description

The objective of this pull request is to fail document operations when their enclosed barrier requests hit a 410 Lease Not Found. This will help with quicker cross region retries for reads and also curb resource utilization by bailing out of aggressive barrier retries on the client side.

Approach

For reads:

  • A quorum of Head requests is performed. If from within a quorum, a single response is a 410 Lease Not Found, one last Head request is performed on the primary replica post a force address refresh. If the Head request on the primary also returns a 410 Lease Not Found then enclosing request (a non-write request) is failed with a 503 Lease Not Found. If a barrier from primary replica cannot be satisfied, the barrier attainment loop is reentered.
sequenceDiagram
    autonumber
    participant Caller
    participant QuorumReader
    participant StoreReader
    participant BarrierHelper as BarrierRequestHelper
    
    Note over Caller,StoreReader: readQuorumAsync Flow
    
    Caller->>+QuorumReader: readQuorumAsync(entity, readQuorum, includePrimary, readMode)
    QuorumReader->>QuorumReader: Check if timeout elapsed
    QuorumReader->>QuorumReader: ensureQuorumSelectedStoreResponse()
    
    alt quorumSelectedStoreResponse is null
        QuorumReader->>+StoreReader: readMultipleReplicaAsync(entity, includePrimary, readQuorum)
        StoreReader-->>-QuorumReader: List<StoreResult> responses
        
        QuorumReader->>QuorumReader: Collect storeResponses for logging
        
        rect rgb(255, 200, 200)
        Note over QuorumReader,Caller: ⚠️ AvoidQuorumSelectionException Path
        alt Has AvoidQuorumSelectionException
            QuorumReader-->>Caller: ReadQuorumResult(QuorumNotPossibleInCurrentRegion)
        else responseCount < readQuorum
            QuorumReader-->>Caller: ReadQuorumResult(QuorumNotSelected)
        else isQuorumMet() returns true
            QuorumReader->>QuorumReader: Select response with max LSN
            QuorumReader-->>Caller: ReadQuorumResult(QuorumMet)
        else Need barrier (quorum not met)
            QuorumReader->>QuorumReader: Select readLsn, globalCommittedLSN, storeResult
        end
        end
    else quorumSelectedStoreResponse exists
        QuorumReader->>QuorumReader: Use cached readLsn, globalCommittedLSN, storeResult
    end
    
    Note over QuorumReader,BarrierHelper: Barrier Required Path
    
    QuorumReader->>+BarrierHelper: createAsync(entity, readLsn, globalCommittedLSN)
    BarrierHelper-->>-QuorumReader: barrierRequest
    QuorumReader->>QuorumReader: waitForReadBarrierAsync(barrierRequest, false, readQuorum, readLsn, globalCommittedLSN)
    
    Note over QuorumReader,StoreReader: waitForReadBarrierAsync Flow (Initial Loop)
    
    loop maxNumberOfReadBarrierReadRetries times
        QuorumReader->>QuorumReader: Check if timeout elapsed
        QuorumReader->>+StoreReader: readMultipleReplicaAsync(barrierRequest, allowPrimary, readQuorum, forceReadAll=true)
        StoreReader-->>-QuorumReader: List<StoreResult> responses
        
        rect rgb(255, 200, 200)
        Note over QuorumReader,StoreReader: ⚠️ AvoidQuorumSelectionException Path
        alt Has AvoidQuorumSelectionException
            QuorumReader->>QuorumReader: isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException()
            QuorumReader->>QuorumReader: performBarrierOnPrimaryAndDetermineIfBarrierCanBeSatisfied()
            QuorumReader->>+StoreReader: readPrimaryAsync(entity, requiresValidLsn=true)
            StoreReader-->>-QuorumReader: StoreResult from primary
            
            alt Primary has required LSN and GlobalCommittedLSN
                QuorumReader->>QuorumReader: Set bailFromReadBarrierLoop=true, cosmosException=null
                Note over QuorumReader: Barrier Met - Exit Loop
            else Primary doesn't have required LSN
                QuorumReader->>QuorumReader: Set bailFromReadBarrierLoop=true, cosmosException=SERVICE_UNAVAILABLE
                Note over QuorumReader: Fail Fast - Exit Loop
            end
        else Responses have sufficient LSN
            QuorumReader->>QuorumReader: Check: count(lsn >= readBarrierLsn) >= readQuorum
            QuorumReader->>QuorumReader: Check: maxGlobalCommittedLsn >= targetGlobalCommittedLSN (if applicable)
            
            alt Both conditions met
                Note over QuorumReader: Barrier Met - Exit Loop
            else Conditions not met
                QuorumReader->>QuorumReader: Update maxGlobalCommittedLsn
                QuorumReader->>QuorumReader: Decrement readBarrierRetryCount
                
                alt readBarrierRetryCount == 0
                    Note over QuorumReader: Single-region retries exhausted
                else readBarrierRetryCount > 0
                    QuorumReader->>QuorumReader: Delay delayBetweenReadBarrierCallsInMs
                    Note over QuorumReader: Continue retry loop
                end
            end
        end
        end
    end
    
    Note over QuorumReader,StoreReader: Multi-Region Barrier (if targetGlobalCommittedLSN > 0)
    
    alt targetGlobalCommittedLSN > 0 AND initial barrier failed
        loop maxBarrierRetriesForMultiRegion times
            QuorumReader->>QuorumReader: Check if timeout elapsed
            QuorumReader->>+StoreReader: readMultipleReplicaAsync(barrierRequest, allowPrimary, readQuorum, forceReadAll=true)
            StoreReader-->>-QuorumReader: List<StoreResult> responses
            
            rect rgb(255, 200, 200)
            Note over QuorumReader,StoreReader: ⚠️ AvoidQuorumSelectionException Path
            alt Has AvoidQuorumSelectionException
                QuorumReader->>QuorumReader: isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException()
                Note over QuorumReader: Same as above
            else Multi-region barrier conditions met
                QuorumReader->>QuorumReader: Check: count(lsn >= readBarrierLsn) >= readQuorum
                QuorumReader->>QuorumReader: Check: maxGlobalCommittedLsn >= targetGlobalCommittedLSN
                
                alt Both conditions met
                    Note over QuorumReader: Global Strong Barrier Met
                else Conditions not met
                    QuorumReader->>QuorumReader: Update maxGlobalCommittedLsn
                    QuorumReader->>QuorumReader: Decrement readBarrierRetryCountMultiRegion
                    
                    alt readBarrierRetryCountMultiRegion == 0
                        Note over QuorumReader: Multi-region retries exhausted
                    else readBarrierRetryCountMultiRegion > maxShortBarrierRetriesForMultiRegion
                        QuorumReader->>QuorumReader: Delay barrierRetryIntervalInMsForMultiRegion
                    else
                        QuorumReader->>QuorumReader: Delay shortBarrierRetryIntervalInMsForMultiRegion
                    end
                end
            end
            end
        end
    end
    
    QuorumReader->>QuorumReader: waitForReadBarrierAsync returns Boolean
    
    alt bailOnBarrier AND cosmosException != null
        QuorumReader-->>Caller: ReadQuorumResult(QuorumNotPossibleInCurrentRegion, failFastException)
    else waitForReadBarrier == false
        QuorumReader-->>Caller: ReadQuorumResult(QuorumSelected, readLsn, globalCommittedLSN)
    else waitForReadBarrier == true
        QuorumReader-->>-Caller: ReadQuorumResult(QuorumMet, readLsn, globalCommittedLSN)
    end
Loading

For writes:

  • A cycle of a single Head request is performed. If the response is a 410 Lease Not Found, one last Head request is performed on the primary replica post a force address refresh. If the Head request on the primary also returns a 410 Lease Not Found then the enclosing request (a write request) is failed with a 408 Lease Not Found. If a barrier from primary replica cannot be satisfied, the barrier attainment loop is reentered.
sequenceDiagram
    autonumber
    participant Caller
    participant ConsistencyWriter
    participant StoreReader
    participant BarrierHelper as BarrierRequestHelper
    
    Note over Caller,StoreReader: barrierForGlobalStrong Flow
    
    Caller->>+ConsistencyWriter: barrierForGlobalStrong(request, response)
    ConsistencyWriter->>ConsistencyWriter: Check if ReplicatedResourceClient.isGlobalStrongEnabled()
    ConsistencyWriter->>ConsistencyWriter: Check isGlobalStrongRequest(request, response)
    
    alt Not a Global Strong Request
        ConsistencyWriter-->>Caller: Return original response
    else Is Global Strong Request
        ConsistencyWriter->>ConsistencyWriter: getLsnAndGlobalCommittedLsn(response)
        
        alt lsn == -1 OR globalCommittedLsn == -1
            ConsistencyWriter-->>Caller: Error: GoneException (SERVER_GENERATED_410)
        else Valid LSN values
            ConsistencyWriter->>ConsistencyWriter: Cache response in request.requestContext.globalStrongWriteResponse
            ConsistencyWriter->>ConsistencyWriter: Set request.requestContext.globalCommittedSelectedLSN = lsn
            ConsistencyWriter->>ConsistencyWriter: Set forceRefreshAddressCache = false
            
            alt globalCommittedLsn >= lsn
                Note over ConsistencyWriter: No barrier needed - already globally committed
                ConsistencyWriter-->>Caller: Return globalStrongWriteResponse
            else globalCommittedLsn < lsn
                Note over ConsistencyWriter: Barrier Required - Write region ahead of read regions
                ConsistencyWriter->>+BarrierHelper: createAsync(request, null, globalCommittedSelectedLSN)
                BarrierHelper-->>-ConsistencyWriter: barrierRequest
                
                ConsistencyWriter->>ConsistencyWriter: waitForWriteBarrierAsync(barrierRequest, globalCommittedSelectedLSN)
                
                Note over ConsistencyWriter,StoreReader: waitForWriteBarrierAsync Flow
                
                loop MAX_NUMBER_OF_WRITE_BARRIER_READ_RETRIES times
                    ConsistencyWriter->>ConsistencyWriter: Check if timeout elapsed
                    ConsistencyWriter->>+StoreReader: readMultipleReplicaAsync(barrierRequest, allowPrimary=true, readQuorum=1)
                    StoreReader-->>-ConsistencyWriter: List<StoreResult> responses
                    
                    rect rgb(255, 200, 200)
                    Note over ConsistencyWriter,StoreReader: ⚠️ AvoidQuorumSelectionException Path
                    alt Has AvoidQuorumSelectionException
                        ConsistencyWriter->>ConsistencyWriter: isBarrierMeetPossibleInPresenceOfAvoidQuorumSelectionException()
                        ConsistencyWriter->>ConsistencyWriter: performBarrierOnPrimaryAndDetermineIfBarrierCanBeSatisfied()
                        ConsistencyWriter->>ConsistencyWriter: Set forceRefreshAddressCache = true
                        ConsistencyWriter->>+StoreReader: readPrimaryAsync(entity, requiresValidLsn=true)
                        StoreReader-->>-ConsistencyWriter: StoreResult from primary
                        
                        alt Primary has required GlobalCommittedLSN
                            ConsistencyWriter->>ConsistencyWriter: Set bailFromWriteBarrierLoop=true, cosmosException=null
                            Note over ConsistencyWriter: Barrier Met - Exit Loop
                        else Primary doesn't have required GlobalCommittedLSN
                            ConsistencyWriter->>ConsistencyWriter: Set bailFromWriteBarrierLoop=true
                            ConsistencyWriter->>ConsistencyWriter: Set cosmosException=REQUEST_TIMEOUT with original subStatusCode
                            Note over ConsistencyWriter: Fail Fast - Exit Loop
                        end
                    else Check responses for barrier satisfaction
                        alt Any response.globalCommittedLSN >= selectedGlobalCommittedLsn
                            Note over ConsistencyWriter: Barrier Met - Return true
                        else No sufficient globalCommittedLSN
                            ConsistencyWriter->>ConsistencyWriter: Update maxGlobalCommittedLsnReceived
                            ConsistencyWriter->>ConsistencyWriter: Set forceRefreshAddressCache = false
                            ConsistencyWriter->>ConsistencyWriter: Decrement writeBarrierRetryCount
                            
                            alt writeBarrierRetryCount == 0
                                Note over ConsistencyWriter: All retries exhausted
                            else writeBarrierRetryCount > MAX_SHORT_BARRIER_RETRIES
                                ConsistencyWriter->>ConsistencyWriter: Delay DELAY_BETWEEN_WRITE_BARRIER_CALLS_IN_MS (30ms)
                                Note over ConsistencyWriter: Continue retry loop
                            else writeBarrierRetryCount <= MAX_SHORT_BARRIER_RETRIES
                                ConsistencyWriter->>ConsistencyWriter: Delay SHORT_BARRIER_RETRY_INTERVAL_IN_MS (10ms)
                                Note over ConsistencyWriter: Continue retry loop (short interval)
                            end
                        end
                    end
                    end
                end
                
                ConsistencyWriter->>ConsistencyWriter: waitForWriteBarrierAsync returns Boolean
                
                alt Barrier satisfied (true)
                    ConsistencyWriter-->>Caller: Return globalStrongWriteResponse
                else Barrier not satisfied (false)
                    alt cosmosException != null
                        ConsistencyWriter-->>Caller: Error: cosmosException (REQUEST_TIMEOUT)
                    else No exception
                        ConsistencyWriter-->>Caller: Error: GoneException (GLOBAL_STRONG_WRITE_BARRIER_NOT_MET)
                    end
                end
            end
        end
    end
    deactivate ConsistencyWriter
Loading

NOTE

This PR does not bail out barrier requests when a quorum of document requests could not be achieved and read is being performed on purely the primary replica (QuorumNotSelected flow).

Fixes

#46135

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 changed the title Fail fast barrier lease not found Bail out from barriers when barriers hit 410 Lease Not Found. Nov 14, 2025
@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@jeet1995 jeet1995 marked this pull request as ready for review November 18, 2025 01:24
Copilot finished reviewing on behalf of jeet1995 November 18, 2025 01:28
@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements fail-fast behavior for barrier requests that encounter 410 "Lease Not Found" errors in Strong Consistency scenarios. The key improvement is avoiding unnecessary retries when barriers hit persistent lease-not-found errors by performing a final barrier check on the primary replica after a forced address refresh.

Key Changes:

  • Added logic in QuorumReader and ConsistencyWriter to detect AvoidQuorumSelectionException scenarios during barrier flows and bail out with appropriate errors (503 for reads, 408 for writes)
  • Removed conditional logic in StoreReader that previously excluded barrier requests from fail-fast handling
  • Added comprehensive test coverage with new multi-region-strong test profile and ExitFromConsistencyLayerTests

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
QuorumReader.java Added fail-fast logic for read barriers encountering 410 errors, with primary replica validation
ConsistencyWriter.java Added fail-fast logic for write barriers encountering 410 errors, with primary replica validation
StoreReader.java Removed barrier request exclusion from fail-fast exception handling
StoreResponse.java Added setHeaderValue method for test response manipulation
RntbdTransportClient.java Added copy constructor for test infrastructure
ExitFromConsistencyLayerTests.java New comprehensive test suite validating barrier bailout behavior
StoreResponseInterceptorUtils.java Test utilities for injecting controlled failures during barrier flows
RntbdTransportClientWithStoreResponseInterceptor.java Test wrapper enabling response interception
ReflectionUtils.java Added helper method to access StoreReader from ConsistencyWriter
TestSuiteBase.java Added multi-region-strong test group registration
live-platform-matrix.json Added test configuration for multi-region strong consistency accounts
multi-region-strong.xml TestNG suite configuration for new test group
pom.xml Added Maven profile for multi-region-strong tests
CHANGELOG.md Documented the optimization in Other Changes section

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

</plugins>
</build>
</profile>
<profile>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[blocking]: wire up Netty bytebuf allocation tracking configs. cc: @FabianMeiswinkel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has been merged now - so, should be easy to copy the same settings

…into failFastBarrierLeaseNotFound

# Conflicts:
#	sdk/cosmos/azure-cosmos-tests/pom.xml
#	sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/rx/TestSuiteBase.java
#	sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResponse.java
#	sdk/cosmos/live-platform-matrix.json
@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Member Author

/azp run java - cosmos - ci

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

final long targetGlobalCommittedLSN,
ReadMode readMode) {
ReadMode readMode,
MutableVolatile<CosmosException> cosmosExceptionValueHolder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an opinonated NIT - so, feel free to ignore.

I would not have used MutableVolatile here - it is safe - and actually slightly more efficient than AtomicReference which I would have preferred. Why I would have used AtomicReference - we used it in many other places - to me consistency and simplicity outweighs the nitty efficieny improvement of MutableVolatile because we don't strictly need atmoic composite operations like INcrement or comapreAndSet

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants