Skip to content

Conversation

@groutr
Copy link
Contributor

@groutr groutr commented Oct 28, 2025

The SingleHDF5ToZarr class will consume unsightly amounts of memory over extended periods of time. I tracked this memory usage down to the fact that the class does not close the hdf5 file. Adding a SingleHDF5ToZarr.close() method that closed h5py file and the associated fsspec file object solved the slow leak.

This cut memory usage from >1GB to less than 300MB after processing a large number of hdf5 files. The close method can be invoked directly or via a contextlib.closing context.

The example gen_json function would like the following with these changes to ensure resources are properly cleaned up.

from contextlib import closing

def gen_json(file_url):
    with fs.open(file_url, **so) as infile:
        with closing(SingleHDF5ToZarr(infile, file_url, inline_threshold=300)) as h5chunks:
            # inline threshold adjusts the Size below which binary blocks are included directly in the output
            # a higher inline threshold can result in a larger json file but faster loading time
            variable = file_url.split('/')[-1].split('.')[0]
            month = file_url.split('/')[2]
            outf = f'{month}_{variable}.json' #file name to save json to
            with fs2.open(outf, 'wb') as f:
                f.write(ujson.dumps(h5chunks.translate()).encode());

If a file object is passed in, then it is the caller's responsibility to close the file object.
@martindurant
Copy link
Member

martindurant commented Oct 29, 2025

Can you please add something to the docsting saying that the user should call .close() to free up memory? Should a default __del__ implementation call it too?

@martindurant martindurant merged commit 08bc7eb into fsspec:main Nov 6, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants