Skip to content

Conversation

@zhengjun-xing
Copy link

When running benchmarks with a large number of copies, the process may raise:
OSError: [Errno 24] Too many open files.

Example command:
(fbgemm_gpu_env)$ ulimit -n 1048576
(fbgemm_gpu_env)$ python ./bench/tbe/tbe_inference_benchmark.py nbit-cpu
--num-embeddings=40000000 --bag-size=2 --embedding-dim=96
--batch-size=162 --num-tables=8 --weights-precision=int4
--output-dtype=fp32 --copies=96 --iters=30000

PyTorch multiprocessing provides two shared-memory strategies: 1.file_descriptor (default)
2.file_system

The default file_descriptor strategy uses file descriptors as shared memory handles, which can result in a large number of open FDs when many tensors are shared.
If the total number of open FDs exceeds the system limit and cannot be raised, the file_system strategy should be used instead.

This patch allows switching to the file_system strategy by setting:
export PYTORCH_SHARE_STRATEGY='file_system'

Reference:
https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies

When running benchmarks with a large number of copies, the process may raise:
 OSError: [Errno 24] Too many open files.

Example command:
(fbgemm_gpu_env)$ ulimit -n 1048576
(fbgemm_gpu_env)$ python ./bench/tbe/tbe_inference_benchmark.py nbit-cpu
\
    --num-embeddings=40000000 --bag-size=2 --embedding-dim=96 \
    --batch-size=162 --num-tables=8 --weights-precision=int4 \
    --output-dtype=fp32 --copies=96 --iters=30000

PyTorch multiprocessing provides two shared-memory strategies:
1.file_descriptor (default)
2.file_system

The default file_descriptor strategy uses file descriptors as shared
memory handles, which can result in a large number of open FDs when many
tensors are shared.
If the total number of open FDs exceeds the system limit and cannot be
raised, the file_system strategy should be used instead.

This patch allows switching to the file_system strategy by setting:
  export PYTORCH_SHARE_STRATEGY='file_system'

Reference:
https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies
@netlify
Copy link

netlify bot commented Oct 21, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 0135160
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68f72f2c23f2cb0008303d1f
😎 Deploy Preview https://deploy-preview-5037--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant