Skip to content

Add production deployment scripts and documentation#19

Open
sahinemreaslan wants to merge 7 commits intofacebookresearch:mainfrom
sahinemreaslan:claude/clean-up-article-code-011CUyrVpQCTCvJ7jn17iZwA
Open

Add production deployment scripts and documentation#19
sahinemreaslan wants to merge 7 commits intofacebookresearch:mainfrom
sahinemreaslan:claude/clean-up-article-code-011CUyrVpQCTCvJ7jn17iZwA

Conversation

@sahinemreaslan
Copy link

This commit adds comprehensive deployment infrastructure for EdgeTAM:

  1. Model Export Scripts:

    • export_to_onnx.py: Export PyTorch model to ONNX format
    • convert_to_tensorrt.py: Convert ONNX to TensorRT engines
  2. Inference Examples:

    • deploy/pytorch_inference.py: Reference PyTorch implementation
    • deploy/onnx_inference.py: Production-ready ONNX inference
    • deploy/tensorrt_inference.py: High-performance TensorRT inference
  3. Documentation:

    • DEPLOYMENT.md: Comprehensive deployment guide (Turkish)
    • requirements-deploy.txt: Deployment dependencies

Features:

  • Support for ONNX and TensorRT deployment
  • Simulation mode for performance benchmarking
  • Real-world integration examples
  • Docker deployment instructions
  • Performance optimization tips

The deployment pipeline:
PyTorch Model -> ONNX -> TensorRT (FP32/FP16/INT8)

This commit adds comprehensive deployment infrastructure for EdgeTAM:

1. Model Export Scripts:
   - export_to_onnx.py: Export PyTorch model to ONNX format
   - convert_to_tensorrt.py: Convert ONNX to TensorRT engines

2. Inference Examples:
   - deploy/pytorch_inference.py: Reference PyTorch implementation
   - deploy/onnx_inference.py: Production-ready ONNX inference
   - deploy/tensorrt_inference.py: High-performance TensorRT inference

3. Documentation:
   - DEPLOYMENT.md: Comprehensive deployment guide (Turkish)
   - requirements-deploy.txt: Deployment dependencies

Features:
- Support for ONNX and TensorRT deployment
- Simulation mode for performance benchmarking
- Real-world integration examples
- Docker deployment instructions
- Performance optimization tips

The deployment pipeline:
PyTorch Model -> ONNX -> TensorRT (FP32/FP16/INT8)
@meta-cla
Copy link

meta-cla bot commented Nov 10, 2025

Hi @sahinemreaslan!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

This commit fixes the ONNX export error by properly handling the
high-resolution features that EdgeTAM uses.

Changes:
1. export_to_onnx.py:
   - EdgeTAMImageEncoder now exports high-res features (256x256, 128x128)
   - EdgeTAMMaskDecoder accepts optional high-res feature inputs
   - Auto-detect use_high_res_features from model config
   - Update opset version to 18 (recommended for PyTorch 2.3+)

2. deploy/onnx_inference.py:
   - Support high-res features in inference
   - Auto-detect model capabilities from ONNX outputs
   - Handle both single-output and multi-output encoders

3. deploy/tensorrt_inference.py:
   - Allocate additional GPU buffers for high-res features
   - Support high-res features in encode/decode pipeline
   - Auto-detect engine capabilities

The exported ONNX models now properly utilize EdgeTAM's high-resolution
feature pyramid for better segmentation accuracy.
The forward_image() method already applies conv_s0 and conv_s1 to the
high-resolution features, so we should not apply them again in the
export wrapper. This was causing a channel mismatch error:
'expected input to have 256 channels, but got 32 channels instead'
The new torch.export/dynamo exporter has compatibility issues with
EdgeTAM's complex architecture. Switch to the legacy ONNX exporter
by setting dynamo=False, which is more stable and widely tested.

This resolves torch.export tracing errors with the model's forward_image
and high-resolution feature handling.
Display encoder outputs and decoder inputs count to help debug
model configuration issues. This will show whether high-res features
are correctly detected.
Dynamic axes were causing broadcasting errors in ONNX Runtime with certain
operations (like index_put). Use fixed batch size (1) and fixed number of
points (1) for more stable ONNX export.

This is acceptable for production deployment as:
- Batch size 1 is typical for real-time inference
- Multiple points can be added in sequence if needed
- Fixed shapes have better runtime performance

Fixes: ONNXRuntimeError with Where/index_put_2 node
- Fix default config path in export_to_onnx.py (configs/edgetam.yaml -> sam2/configs/edgetam.yaml)
- Update DEPLOYMENT.md with correct config paths
- Fix help text in export script to reference correct inference script
- Add HIZLI_BASLANGIC.md (Turkish quick start guide) with step-by-step instructions
- Improve user experience with clear setup and usage instructions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants