feat: add VLM_SmolVLM_Local plugin for fully local vision-language inference#2472
Open
Wanbogang wants to merge 2 commits intoOpenMind:mainfrom
Open
feat: add VLM_SmolVLM_Local plugin for fully local vision-language inference#2472Wanbogang wants to merge 2 commits intoOpenMind:mainfrom
Wanbogang wants to merge 2 commits intoOpenMind:mainfrom
Conversation
Add VLM_SmolVLM_Local plugin that runs SmolVLM2 vision-language model
directly via HuggingFace transformers — no Ollama or internet connection
required after the initial model download.
- Auto-detects GPU via torch.cuda.is_available(), falls back to CPU
- Default model SmolVLM2-256M requires less than 1GB VRAM
- Graceful degradation if transformers is not installed
- Add smolvlm optional dependency group in pyproject.toml
- Add config/smolvlm_local.json5 for fully local stack with OllamaLLM
- 16 tests, 99% coverage
Install optional dependencies with:
pip install om1[smolvlm]
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a new vision-language model (VLM) input plugin that runs SmolVLM2
directly via HuggingFace
transformers— no Ollama, no internet connection,and no external server required after the initial model download.
Design Decisions
Why SmolVLM2-256M?
Why optional dependency?
transformersis a large package (~500MB+). Adding it to the maindependencieswould force all OM1 users to install it even if theynever use this plugin. Instead, it is added as an optional group:
Users install it only when needed:
If
transformersis not installed, the plugin logs a clear warningand disables itself gracefully — no crash, no exception propagation.
GPU auto-detection
Follows the same pattern as other local plugins:
Falls back to CPU automatically if no CUDA device is available.
How to Use
config/smolvlm_local.json5or add to your config:Testing
The 1% missing coverage is
HAS_TRANSFORMERS = Trueon line 26,which only executes when
transformersis installed. This isintentionally not installed in the OM1 dev venv since it is an
optional dependency.