Skip to content

fix: BatchResult completion reason logic#286

Merged
yaythomas merged 1 commit intomainfrom
completion-reason
Feb 6, 2026
Merged

fix: BatchResult completion reason logic#286
yaythomas merged 1 commit intomainfrom
completion-reason

Conversation

@yaythomas
Copy link
Member

@yaythomas yaythomas commented Jan 30, 2026

Description of changes:
Fix BatchResult.from_items() to match JavaScript SDK behavior. The previous implementation incorrectly checked conditions in the wrong order, causing incorrect completion reasons for concurrent operations with failure tolerance configurations.

The checkpointed completion reason is preserved during replay, so existing executions are unaffected. Code with conditional logic based on completion_reason might see different values after this update.

Example: With 3 items (1 success, 2 failures) and tolerated_failure_count=1:

  • Before: ALL_COMPLETED (incorrect - all items finished)
  • After: FAILURE_TOLERANCE_EXCEEDED (correct - tolerance breached)

Changes:

  • Extract completion reason logic to _get_completion_reason() method
  • Check failure tolerance BEFORE checking if all completed
  • Implement proper fail-fast when no completion config provided
  • Add comprehensive unit tests covering all edge cases

This ensures tolerance checks take precedence over success criteria, preventing operations from incorrectly reporting ALL_COMPLETED when failure tolerance has been exceeded.

new behaviour

Different completion_reason values for map/parallel operations in these scenarios:

  1. No completion config with failures - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
  2. Empty completion config with failures - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
  3. Tolerance exceeded with all completed - Now returns FAILURE_TOLERANCE_EXCEEDED (previously ALL_COMPLETED)
  4. Both min_successful and tolerance met - Now returns FAILURE_TOLERANCE_EXCEEDED (previously MIN_SUCCESSFUL_REACHED)

reference logic

https://github.com/aws/aws-durable-execution-sdk-js/blob/main/packages/aws-durable-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts#L41-L75

  private getCompletionReason<T, R>(
    failureCount: number,
    successCount: number,
    completedCount: number,
    items: ConcurrentExecutionItem<T>[],
    config: ConcurrencyConfig<R>,
  ): "ALL_COMPLETED" | "MIN_SUCCESSFUL_REACHED" | "FAILURE_TOLERANCE_EXCEEDED" {
    // Check tolerance first, before checking if all completed
    const completion = config.completionConfig;

    // Handle fail-fast behavior (no completion config or empty completion config)
    if (!completion) {
      if (failureCount > 0) return "FAILURE_TOLERANCE_EXCEEDED";
    } else {
      const hasAnyCompletionCriteria = Object.values(completion).some(
        (value) => value !== undefined,
      );
      if (!hasAnyCompletionCriteria) {
        if (failureCount > 0) return "FAILURE_TOLERANCE_EXCEEDED";
      } else {
        // Check specific tolerance thresholds
        if (
          completion.toleratedFailureCount !== undefined &&
          failureCount > completion.toleratedFailureCount
        ) {
          return "FAILURE_TOLERANCE_EXCEEDED";
        }
        if (completion.toleratedFailurePercentage !== undefined) {
          const failurePercentage = (failureCount / items.length) * 100;
          if (failurePercentage > completion.toleratedFailurePercentage) {
            return "FAILURE_TOLERANCE_EXCEEDED";
          }
        }
      }
    }

Issue #, if available:
closes #280

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Fix BatchResult.from_items() to match JavaScript SDK behavior.
The previous implementation incorrectly checked conditions in the
wrong order, causing incorrect completion reasons for concurrent
operations with failure tolerance configurations.

The checkpointed completion reason is preserved during replay,
so existing executions are unaffected. Code with conditional
logic based on completion_reason might see different values after
this update.

Example: With 3 items (1 success, 2 failures) and tolerated_failure_count=1:
- Before: ALL_COMPLETED (incorrect - all items finished)
- After: FAILURE_TOLERANCE_EXCEEDED (correct - tolerance breached)

Changes:
- Extract completion reason logic to _get_completion_reason() method
- Check failure tolerance BEFORE checking if all completed
- Implement proper fail-fast when no completion config provided
- Add comprehensive unit tests covering all edge cases

This ensures tolerance checks take precedence over success criteria,
preventing operations from incorrectly reporting ALL_COMPLETED when
failure tolerance has been exceeded.

closes #280
@yaythomas yaythomas merged commit e5824f1 into main Feb 6, 2026
14 of 15 checks passed
@yaythomas yaythomas deleted the completion-reason branch February 6, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: completion_reason add tolerance checks

3 participants