Skip to content

Conversation

@rohan-uiuc
Copy link
Contributor

@rohan-uiuc rohan-uiuc commented Nov 12, 2025

PR Type

Feature

Short Description

Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.

The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from --model in vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g., Qwen/Qwen2.5-7B-Instruct) via --model in vllm_args for automatic downloads to work.

Fixes #166

Tests Added

  • test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't exist
  • test_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missing
  • test_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights
  • All existing tests pass (28 tests in test_slurm_script_generator.py, 116+ total tests)
  • Verified end-to-end: model downloads and serves successfully from HuggingFace when local weights don't exist and --model is specified in vllm_args

@codecov-commenter
Copy link

codecov-commenter commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 90.47619% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.80%. Comparing base (d11de79) to head (c68cb35).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
vec_inf/client/_slurm_script_generator.py 90.47% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
- Coverage   90.83%   90.80%   -0.04%     
==========================================
  Files          14       14              
  Lines        1342     1359      +17     
==========================================
+ Hits         1219     1234      +15     
- Misses        123      125       +2     
Files with missing lines Coverage Δ
vec_inf/client/_slurm_templates.py 100.00% <ø> (ø)
vec_inf/client/_slurm_script_generator.py 96.77% <90.47%> (-1.06%) ⬇️

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@XkunW XkunW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota

],
"imports": "source {src_dir}/find_port.sh",
"bind_path": f"export {CONTAINER_MODULE_NAME.upper()}_BINDPATH=${CONTAINER_MODULE_NAME.upper()}_BINDPATH,/dev,/tmp,{{model_weights_path}}{{additional_binds}}",
"bind_path": f"export {CONTAINER_MODULE_NAME.upper()}_BINDPATH=${CONTAINER_MODULE_NAME.upper()}_BINDPATH,$(echo /dev/infiniband* | sed -e 's/ /,/g'),/dev,/tmp{{model_weights_path}}{{additional_binds}}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an error from merging the code? The /dev directory is already binded so the extra handling for /dev/infiniband is no longer needed

"""
launcher_script = ["\n"]

vllm_args_copy = self.params["vllm_args"].copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is necessary, as the model name should be parsed with launch command not part of --vllm-args

if self.model_weights_exists
else self.params["model_name"]
)
self.model_bind_option = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this member variable is never used anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for on-the-fly model downloads to llm-inference package Support downloading model weights on the fly from HF

3 participants