Hi,
I may have identified a caching issue in EOReader related to how temporary files are named and reused.
Summary
EOReader writes temporary files after loading a band (with all processing applied):
- cropping
- resampling
- nodata handling
- reflectance conversion
- etc.
These files are reused in subsequent product.load() calls based on a filename hash that encodes parameters such as:
- to_reflectance (-> "as_is" in the filename when to_reflectance=False)
- clean_optical (CleanMethod.name in the filename)
- window (hash)
- pixel_size (e.g "10m")
However, the resampling parameter is not included in the cache key / filename
Expected behavior
Changing the resampling method should produce:
- either a different temporary file
- or force recomputation
Actual behavior
If I run:
bili = product.load(BAND, resampling=Resampling.bilinear)
cubic = product.load(BAND, resampling=Resampling.cubic)
The second call returns the same data as the first one, because EOReader reuses the cached file even though the resampling method is different.
Workaround
Call product.clean_tmp() between the two calls to product.load will delete all the temporary files and force a reload with the resampling method you asked.
Why this is problematic
- Results are incorrect and misleading
- The output depends on the first call, not the current parameters
- This breaks reproducibility and determinism
Suggested fix
Include resampling in the cache key / temporary filename.
Is this behavior intentional ? Are there other parameters not included in the cache key ?
Thank you !
Hi,
I may have identified a caching issue in EOReader related to how temporary files are named and reused.
Summary
EOReader writes temporary files after loading a band (with all processing applied):
These files are reused in subsequent product.load() calls based on a filename hash that encodes parameters such as:
However, the resampling parameter is not included in the cache key / filename
Expected behavior
Changing the resampling method should produce:
Actual behavior
If I run:
The second call returns the same data as the first one, because EOReader reuses the cached file even though the resampling method is different.
Workaround
Call
product.clean_tmp()between the two calls to product.load will delete all the temporary files and force a reload with the resampling method you asked.Why this is problematic
Suggested fix
Include resampling in the cache key / temporary filename.
Is this behavior intentional ? Are there other parameters not included in the cache key ?
Thank you !