Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167

rohan-uiuc · 2025-11-12T22:58:44Z

PR Type

Feature

Short Description

Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.

The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from --model in vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g., Qwen/Qwen2.5-7B-Instruct) via --model in vllm_args for automatic downloads to work.

Fixes #166

Tests Added

test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't exist
test_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missing
test_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights
All existing tests pass (28 tests in test_slurm_script_generator.py, 116+ total tests)
Verified end-to-end: model downloads and serves successfully from HuggingFace when local weights don't exist and --model is specified in vllm_args

…vllm args

…ssing

…/llm-inference into hf_download

codecov-commenter · 2025-11-12T23:04:46Z

Codecov Report

❌ Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.80%. Comparing base (d11de79) to head (c68cb35).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
vec_inf/client/_slurm_script_generator.py	90.47%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
- Coverage   90.83%   90.80%   -0.04%     
==========================================
  Files          14       14              
  Lines        1342     1359      +17     
==========================================
+ Hits         1219     1234      +15     
- Misses        123      125       +2

Files with missing lines	Coverage Δ
vec_inf/client/_slurm_templates.py	`100.00% <ø> (ø)`
vec_inf/client/_slurm_script_generator.py	`96.77% <90.47%> (-1.06%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

XkunW

Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota

XkunW · 2025-11-18T19:19:55Z

vec_inf/client/_slurm_templates.py

    ],
    "imports": "source {src_dir}/find_port.sh",
-    "bind_path": f"export {CONTAINER_MODULE_NAME.upper()}_BINDPATH=${CONTAINER_MODULE_NAME.upper()}_BINDPATH,/dev,/tmp,{{model_weights_path}}{{additional_binds}}",
+    "bind_path": f"export {CONTAINER_MODULE_NAME.upper()}_BINDPATH=${CONTAINER_MODULE_NAME.upper()}_BINDPATH,$(echo /dev/infiniband* | sed -e 's/ /,/g'),/dev,/tmp{{model_weights_path}}{{additional_binds}}",


This looks like an error from merging the code? The /dev directory is already binded so the extra handling for /dev/infiniband is no longer needed

XkunW · 2025-11-18T19:39:12Z

vec_inf/client/_slurm_script_generator.py

        """
        launcher_script = ["\n"]
+
+        vllm_args_copy = self.params["vllm_args"].copy()


Not sure if this is necessary, as the model name should be parsed with launch command not part of --vllm-args

XkunW · 2025-11-18T19:44:13Z

vec_inf/client/_slurm_script_generator.py

+            if self.model_weights_exists
+            else self.params["model_name"]
+        )
+        self.model_bind_option = (


It looks like this member variable is never used anywhere?

rohan-uiuc added 7 commits October 30, 2025 17:33

Add support to download models automatically if --model specified in …

fc843ed

…vllm args

create model dir if it doesn't exist

5f790ff

Check model weights existence before binding; use HF model name if mi…

0f22bec

…ssing

Remove commented code

9f2fdd2

Apply code formatting fixes from pre-commit

38011be

revert unnecessary test change

4de3563

Merge branch 'develop' of https://github.com/Center-for-AI-Innovation…

eb1e929

…/llm-inference into hf_download

rohan-uiuc added 2 commits November 12, 2025 17:05

Apply formatting fixes from pre-commit

8b6a211

Add tests for model weights existence coverage

c68cb35

XkunW reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167

Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167

Uh oh!

rohan-uiuc commented Nov 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Nov 12, 2025 •

edited

Loading

Uh oh!

XkunW left a comment

Uh oh!

XkunW Nov 18, 2025

Uh oh!

XkunW Nov 18, 2025

Uh oh!

XkunW Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167

Are you sure you want to change the base?

Feature: Support downloading model weights on-the-fly from HuggingFace (#166) #167

Uh oh!

Conversation

rohan-uiuc commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Tests Added

Uh oh!

codecov-commenter commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

XkunW left a comment

Choose a reason for hiding this comment

Uh oh!

XkunW Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

XkunW Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

XkunW Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohan-uiuc commented Nov 12, 2025 •

edited

Loading

codecov-commenter commented Nov 12, 2025 •

edited

Loading