Skip to content

_get_blob_by_metadata_version fetches oldest versioned blob #218

@anth-volk

Description

@anth-volk

Bug Description

The _get_blob_by_metadata_version method in policyengine/utils/data/version_aware_storage_client.py returns the first blob that matches the requested version instead of the newest one.

Current Behavior

When multiple blobs exist with the same metadata version (which can happen when a blob is re-uploaded with the same version tag), the function iterates through bucket.list_blobs() and returns the first match:

for blob in versions:
    if blob.metadata and blob.metadata.get("version") == version:
        return blob  # BUG: Returns FIRST match (oldest), not newest!

This results in returning the oldest blob instead of the newest one.

Expected Behavior

The function should return the blob with the highest generation number (the newest blob) when multiple blobs match the requested version.

Proposed Fix

Collect all matching blobs and return the one with the highest generation number:

matching_blobs = []
for blob in versions:
    if blob.metadata and blob.metadata.get("version") == version:
        matching_blobs.append(blob)

if not matching_blobs:
    raise ValueError(...)

# Return the blob with the highest generation number (newest)
newest_blob = max(matching_blobs, key=lambda b: b.generation)
return newest_blob

Impact

Users requesting a specific version may receive stale data if the blob was re-uploaded with the same version tag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions