Skip to content

Add early rejection for oversized blobs before data transfer #875

@1-ashraful-islam

Description

@1-ashraful-islam

Problem

Currently, blob size validation against max_blob_size happens after data transfer has already started, which can waste bandwidth and resources when clients attempt to upload oversized blobs.

Current Behavior

  1. HTTP PUT Handler (server/http.go:291-316)

    • Extracts Content-Length header early
    • Does NOT validate against max_blob_size
    • Streams data to diskCache.Put(), which then rejects it
  2. gRPC ByteStream (server/grpc_bytestream.go:282-404)

    • Parses size from resource name early
    • Does NOT validate against max size
    • Waits until cache.Put() call to reject
  3. Actual Validation (cache/disk/disk.go:248-250)

    • Validation only happens here, after handlers have started accepting data
    if size > c.maxBlobSize {
        return badReqErr("Blob size %d too large, max blob size is %d", size, c.maxBlobSize)
    }

Desired Behavior

Reject oversized blobs immediately upon receiving the request, before transferring any data.

Proposed Solution

1. HTTP PUT Early Validation

Add size check in server/http.go after line 316:

if h.cache.MaxBlobSize() > 0 && contentLength > h.cache.MaxBlobSize() {
    msg := fmt.Sprintf("Blob size %d exceeds maximum allowed size %d", 
                       contentLength, h.cache.MaxBlobSize())
    http.Error(w, msg, http.StatusBadRequest)
    h.errorLogger.Printf("PUT %s: %s", path(kind, hash), msg)
    return
}

2. gRPC ByteStream Early Validation

Add size check in server/grpc_bytestream.go after line 409:

if s.maxCasBlobSizeBytes > 0 && size > s.maxCasBlobSizeBytes {
    recvResult <- status.Errorf(codes.InvalidArgument,
        "Blob size %d exceeds maximum allowed size %d", 
        size, s.maxCasBlobSizeBytes)
    return
}

Benefits

  • Prevents unnecessary bandwidth consumption
  • Provides immediate feedback to clients
  • Reduces server resource usage
  • Fails fast with clear error messages

Configuration

The existing max_blob_size configuration option (set via CLI flag, env var, or YAML) would continue to work as-is. This change only moves the validation earlier in the request lifecycle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions