feat(voice-assistant): add local LoRA fine-tuning support#102
Open
5ch4um1 wants to merge 1 commit into
Open
Conversation
Add LoRA training capability to scripts/train.py via --lora-rank/--lora-scaling flags and gradient checkpointing on the LFM backbone, enabling fine-tuning on GPUs with as little as 7.5 GB VRAM. New scripts/merge_lora_checkpoint.py merges LoRA weights back into standard Linear layers for GGUF export. Fix scripts/quantize.py check_build_tools() to use shutil.which() instead of the which binary (which is a shell built-in on some systems). Add model_dir support to _server.py and eval.py for evaluating locally quantized GGUFs without uploading to HuggingFace. Update home-assistant README to document local GPU LoRA tuning as an alternative to Modal cloud training.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--lora-rank/--lora-scalingflags onscripts/train.pywith gradient checkpointing, enabling fine-tuning of LFM2.5-Audio-1.5B on GPUs with 7.5 GB VRAM (e.g. RTX 5050)scripts/merge_lora_checkpoint.pymerges LoRA weights back into standard Linear layers for downstream GGUF exportmodel_dirconfig field for evaluating locally quantized GGUFs without uploading to HuggingFace (modified_server.py,eval.py)scripts/quantize.pyusesshutil.which()instead of thewhichbinaryconfigs/finetuned-local-q8.yamlandconfigs/finetuned-r32-q8.yamlfor local eval runsResults
With LoRA rank=32, 5000 steps on a local RTX 5050:
Related
Full fine-tune on A100 reference: 99.0% function, 90.2% argument — LoRA r=32 closes most of the gap using only 1.5% of trainable parameters (~22M of 1.46B).