feat: add VLM_SmolVLM_Local plugin for fully local vision-language inference by Wanbogang · Pull Request #2472 · OpenMind/OM1

Wanbogang · 2026-03-13T10:24:42Z

Summary

Adds a new vision-language model (VLM) input plugin that runs SmolVLM2
directly via HuggingFace transformers — no Ollama, no internet connection,
and no external server required after the initial model download.

Design Decisions

Why SmolVLM2-256M?

Less than 1GB VRAM — runs on embedded hardware and CPU fallback
Apache 2.0 license — compatible with OM1's MIT license
Auto-downloaded from HuggingFace on first run, cached locally

Why optional dependency?

transformers is a large package (~500MB+). Adding it to the main
dependencies would force all OM1 users to install it even if they
never use this plugin. Instead, it is added as an optional group:

[project.optional-dependencies]
smolvlm = [
    "transformers>=4.52.0",
    "num2words>=0.5.14",
]

Users install it only when needed:

pip install om1[smolvlm]

If transformers is not installed, the plugin logs a clear warning
and disables itself gracefully — no crash, no exception propagation.

GPU auto-detection

Follows the same pattern as other local plugins:

self.device = "cuda" if torch.cuda.is_available() else "cpu"

Falls back to CPU automatically if no CUDA device is available.

How to Use

Install optional dependencies:

pip install transformers num2words

Use config/smolvlm_local.json5 or add to your config:

{
  type: "VLM_SmolVLM_Local",
  config: {
    camera_index: 0,
    model_id: "HuggingFaceTB/SmolVLM2-256M-Video-Instruct",
    prompt: "Briefly describe what you see in one or two sentences.",
  },
}

Run OM1 normally — model downloads automatically on first run.

Testing

The 1% missing coverage is HAS_TRANSFORMERS = True on line 26,
which only executes when transformers is installed. This is
intentionally not installed in the OM1 dev venv since it is an
optional dependency.

Add VLM_SmolVLM_Local plugin that runs SmolVLM2 vision-language model directly via HuggingFace transformers — no Ollama or internet connection required after the initial model download. - Auto-detects GPU via torch.cuda.is_available(), falls back to CPU - Default model SmolVLM2-256M requires less than 1GB VRAM - Graceful degradation if transformers is not installed - Add smolvlm optional dependency group in pyproject.toml - Add config/smolvlm_local.json5 for fully local stack with OllamaLLM - 16 tests, 99% coverage Install optional dependencies with: pip install om1[smolvlm]

codecov · 2026-03-13T10:44:45Z

Codecov Report

❌ Patch coverage is 99.08257% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/inputs/plugins/vlm_smolvlm_local.py	99.08%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Wanbogang requested review from a team as code owners March 13, 2026 10:24

github-actions bot added dependencies Pull requests that update a dependency file robotics Robotics code changes python Python code tests Test files config Configuration files labels Mar 13, 2026

Merge branch 'main' into feat/vlm-smolvlm-local

34ed002

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add VLM_SmolVLM_Local plugin for fully local vision-language inference#2472

feat: add VLM_SmolVLM_Local plugin for fully local vision-language inference#2472
Wanbogang wants to merge 2 commits intoOpenMind:mainfrom
Wanbogang:feat/vlm-smolvlm-local

Wanbogang commented Mar 13, 2026

Uh oh!

codecov bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Wanbogang commented Mar 13, 2026

Summary

Design Decisions

Why SmolVLM2-256M?

Why optional dependency?

GPU auto-detection

How to Use

Testing

Uh oh!

codecov bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 13, 2026 •

edited

Loading