Skip to content

feat(core): add TAPI backoff v2 infrastructure (1/3)#1149

Closed
abueide wants to merge 1 commit intomasterfrom
feature/tapi-infra
Closed

feat(core): add TAPI backoff v2 infrastructure (1/3)#1149
abueide wants to merge 1 commit intomasterfrom
feature/tapi-infra

Conversation

@abueide
Copy link
Contributor

@abueide abueide commented Mar 6, 2026

Overview

This PR adds the core backoff infrastructure for TAPI backoff v2. This is part 1 of a 3-part implementation that establishes the foundation without integration into the upload flow.

Related PRs:

Components Added

UploadStateMachine

Manages global rate limiting state for 429 responses.

Key methods:

  • canUpload(): Upload gate that respects rate limit wait times
  • handle429(): Sets rate limit state with configurable retry limits
  • reset(): Clears rate limit state on successful upload

The state machine persists state across app restarts and supports two states: READY and RATE_LIMITED.

BatchUploadManager

Handles per-batch exponential backoff for transient errors like 5xx responses and network failures.

Key methods:

  • createBatch(): Creates metadata for a batch (only called on first failure)
  • handleRetry(): Calculates exponential backoff with jitter and updates metadata
  • validatePersistedMetadata(): Validates timestamps on app restart and drops corrupted batches
  • removeBatch(): Removes metadata after successful upload or when limits exceeded

Config Validation

Provides validation and default application for backoff configurations:

  • validateRateLimitConfig(): Validates rate limit configuration
  • validateBackoffConfig(): Validates backoff configuration

Ensures all numeric values are within safe bounds and applies sensible defaults when values are missing.

Type Definitions

Adds TypeScript types for the backoff system:

  • HttpConfig: Container for Settings CDN configuration
  • RateLimitConfig: Global rate limiting configuration
  • BackoffConfig: Per-batch backoff configuration
  • UploadStateData: State machine persistence types
  • BatchMetadata: Per-batch retry metadata
  • ErrorClassification: Error classification result type

Implementation Details

State Migration

The state machine automatically migrates legacy WAITING state to RATE_LIMITED on app startup to maintain compatibility with any persisted state from development builds.

Metadata Validation

On app restart, the BatchUploadManager validates all persisted batch metadata timestamps. Batches with corrupted or invalid timestamps (negative values, far-future dates, or dates in the past) are automatically dropped to prevent issues.

Batch Lifecycle

Batch metadata is only created on the first failure, not when the batch is initially created. This prevents unnecessary persistence overhead for successful uploads.

Configuration

Both rate limiting and backoff behavior can be configured via Settings CDN through the httpConfig field. The implementation includes production-ready defaults that can be overridden per workspace.

Testing

These components are designed to be unit-testable in isolation. Integration testing will be covered when these components are integrated into the upload flow in part 3.

Dependencies

This PR has no dependencies and can be reviewed independently.

Adds core backoff infrastructure without integration into upload flow.
This PR establishes the foundation for intelligent retry logic.

Components added:
- UploadStateMachine: Manages global rate limiting state (429 responses)
- BatchUploadManager: Per-batch exponential backoff (transient errors)
- Config validation: Validates and applies defaults to backoff configs
- Type definitions: HttpConfig, RateLimitConfig, BackoffConfig, etc.

Key features:
- RATE_LIMITED state with migration from legacy WAITING state
- Metadata validation on app restart to handle corrupted timestamps
- Batch metadata only created on first failure (not prematurely)
- Exponential backoff with jitter for transient errors
- Configurable behavior via Settings CDN

This PR does not integrate with SegmentDestination yet - that comes in
a follow-up PR to keep review size manageable.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@abueide
Copy link
Contributor Author

abueide commented Mar 6, 2026

Note: Unit tests for the backoff infrastructure will be added in a separate PR to keep the review focused on the implementation.

@abueide abueide self-assigned this Mar 6, 2026
@abueide abueide marked this pull request as draft March 6, 2026 20:22
@abueide abueide marked this pull request as draft March 6, 2026 20:22
@abueide
Copy link
Contributor Author

abueide commented Mar 6, 2026

Closing this PR - infrastructure split into two separate PRs for better reviewability:

  • UploadStateMachine: feature/tapi-state-machine (574 lines)
  • BatchUploadManager: feature/tapi-batch-manager (926 lines)

New PRs to be created shortly.

@abueide abueide closed this Mar 6, 2026
@abueide abueide deleted the feature/tapi-infra branch March 6, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant