Inconsistent reading performance with multiple cpu threads #2184
              
                Unanswered
              
          
                  
                    
                      FelipeMoser
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment 1 reply
-
| Is there any update to this issue from the developers? I'm encountering the same issue in my project. | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Zarr version
2.18.2
Numcodecs version
0.13.0
Python Version
3.12.4
Operating System
Linux
Installation
pip install zarr
Description
I've converted some ome.tiff files to .zarr and have had issues with the reading time of the zarr files.
For this example I'm using an image of shape [ 4, 16484, 11620 ], and I have stored it with chunk size (1,1024, 1024) as well as unchunked.
All files are stored in a RAID0 nvme ssd and have the same compression.
I've compared the reading times using 1, 10, and 50 logical threads (with taskset) and noticed the performance can vary greatly depending on the settings. If unchunked, additional threads significantly improves reading time, just like when reading ome.tiffs. In fact, reading unchunked files is significantly faster than ome.tiffs. However, chunked files do not seem to benefit from additional threads, even resulting in slower times. Reading with the dask library also seems to have inconsistent performance, although in a different way.
Additionally, considering the hardware (RAID0 nvme ssd, dual 56 core CPU Intel Xeon Platinum 8280) , I'd assume that reading chunked files with multiple workers would be much faster, as the processing is done in parallel. But here we see that not only does it not seem to benefit from more workers, but it's an order of magnitude slower than reading an unchunked file.
Is there something I could be missing here?
Steps to reproduce
This is the code I'm using:
Results:
Additional output
No response
Beta Was this translation helpful? Give feedback.
All reactions