Skip to content

Use fsspec as default backend in etils.epath to connect to GCS#11215

Draft
ankitaluthra1 wants to merge 3 commits into
tensorflow:masterfrom
ankitaluthra1:master
Draft

Use fsspec as default backend in etils.epath to connect to GCS#11215
ankitaluthra1 wants to merge 3 commits into
tensorflow:masterfrom
ankitaluthra1:master

Conversation

@ankitaluthra1

Copy link
Copy Markdown

Thank you for your contribution!

Please read https://www.tensorflow.org/datasets/contribute#pr_checklist to make sure your PR follows the guidelines.

Add Dataset / Backend Fix

  • Dataset Name: mnist
  • Issue Reference:
  • dataset_info.json Gist:

Description

This PR enables the use of fsspec and gcsfs as the backend for Google Cloud Storage (GCS) operations in etils.epath to support data loading from GCS Rapid buckets.

With this change, if a user sets GCS_PREFER_FSSPEC=true in their environment, it will automatically set EPATH_PREFER_FSSPEC=true during initialization, switching the backend to _FileSystemSpecBackend.

Changes included:

  • tensorflow_datasets/__init__.py: Added environment variable check if os.environ.get('GCS_PREFER_FSSPEC') == 'true': to set os.environ['EPATH_PREFER_FSSPEC'] = 'true'.
  • setup.py: Added 'gcs_prefer_fsspec': ['fsspec', 'gcsfs'] to EXTRAS.
  • tensorflow_datasets/import_test.py: Added tests test_gcs_prefer_fsspec_true and test_gcs_prefer_fsspec_false to ensure the correct backend initializes.

Validation:
These changes were validated by successfully loading the mnist dataset stored in a Rapid bucket and Standard bucket. Data loading was successful after these changes were applied, confirming that the fsspec backend correctly handles reading from Rapid and existing standard storage class buckets.

Checklist

  • Address all TODO's
  • Add alphabetized import to subdirectory's __init__.py
  • Run download_and_prepare successfully (Note: verified via GCS bucket)
  • Add checksums file
  • Properly cite in BibTeX format
  • Add passing test(s)
  • Add test data
  • If using additional dependencies (e.g. scipy), use lazy_imports (if applicable)
  • Add data generation script (if applicable)
  • Lint code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant