Hello,
I am currently implementing a pipeline with DDP and wids. My dataloaders look like the following:
chunk_size = math.ceil(dataset_length / int(os.environ["WORLD_SIZE"]))
dataset = (
wids.ShardListDataset(
wids_map["shardlist"],
cache_dir=cache_dir,
keep=True
)
).add_transform(preprocess)
loader = torch.utils.data.DataLoader(
dataset,
num_workers=num_workers,
batch_size=batch_size,
collate_fn=identify_fn,
pin_memory=True,
sampler=wids.DistributedChunkedSampler(dset, chunksize=chunk_size, shuffle=True) if "train" else None,
)
While everything seems to be working correctly, I am seeing messages around cache miss rate similar to Warning: ShardListDataset has a cache miss rate of 9901.0%%. I haven't found any information on this and was wondering what these signify as it relates to ShardListDataset given the data is already cached locally on disk and the cache_dir simply points there? So I'm not sure how it would miss but still train through the iter and epoch without any performance impact (or so it seems)?
Hello,
I am currently implementing a pipeline with DDP and wids. My dataloaders look like the following:
While everything seems to be working correctly, I am seeing messages around cache miss rate similar to
Warning: ShardListDataset has a cache miss rate of 9901.0%%. I haven't found any information on this and was wondering what these signify as it relates to ShardListDataset given the data is already cached locally on disk and the cache_dir simply points there? So I'm not sure how it would miss but still train through the iter and epoch without any performance impact (or so it seems)?