Fix OSError: [Errno 24] Too many open files in multi-copy benchmark #5037

zhengjun-xing · 2025-10-21T06:58:49Z

When running benchmarks with a large number of copies, the process may raise:
OSError: [Errno 24] Too many open files.

Example command:
(fbgemm_gpu_env)$ ulimit -n 1048576
(fbgemm_gpu_env)$ python ./bench/tbe/tbe_inference_benchmark.py nbit-cpu
--num-embeddings=40000000 --bag-size=2 --embedding-dim=96
--batch-size=162 --num-tables=8 --weights-precision=int4
--output-dtype=fp32 --copies=96 --iters=30000

PyTorch multiprocessing provides two shared-memory strategies: 1.file_descriptor (default)
2.file_system

The default file_descriptor strategy uses file descriptors as shared memory handles, which can result in a large number of open FDs when many tensors are shared.
If the total number of open FDs exceeds the system limit and cannot be raised, the file_system strategy should be used instead.

This patch allows switching to the file_system strategy by setting:
export PYTORCH_SHARE_STRATEGY='file_system'

Reference:
https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies

When running benchmarks with a large number of copies, the process may raise: OSError: [Errno 24] Too many open files. Example command: (fbgemm_gpu_env)$ ulimit -n 1048576 (fbgemm_gpu_env)$ python ./bench/tbe/tbe_inference_benchmark.py nbit-cpu \ --num-embeddings=40000000 --bag-size=2 --embedding-dim=96 \ --batch-size=162 --num-tables=8 --weights-precision=int4 \ --output-dtype=fp32 --copies=96 --iters=30000 PyTorch multiprocessing provides two shared-memory strategies: 1.file_descriptor (default) 2.file_system The default file_descriptor strategy uses file descriptors as shared memory handles, which can result in a large number of open FDs when many tensors are shared. If the total number of open FDs exceeds the system limit and cannot be raised, the file_system strategy should be used instead. This patch allows switching to the file_system strategy by setting: export PYTORCH_SHARE_STRATEGY='file_system' Reference: https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies

netlify · 2025-10-21T06:58:56Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`0135160`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68f72f2c23f2cb0008303d1f
😎 Deploy Preview	https://deploy-preview-5037--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

meta-cla bot added the cla signed label Oct 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix OSError: [Errno 24] Too many open files in multi-copy benchmark #5037

Fix OSError: [Errno 24] Too many open files in multi-copy benchmark #5037

zhengjun-xing commented Oct 21, 2025

Uh oh!

netlify bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix OSError: [Errno 24] Too many open files in multi-copy benchmark #5037

Are you sure you want to change the base?

Fix OSError: [Errno 24] Too many open files in multi-copy benchmark #5037

Conversation

zhengjun-xing commented Oct 21, 2025

Uh oh!

netlify bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify bot commented Oct 21, 2025 •

edited

Loading