Skip to content

feat(voice-assistant): add local LoRA fine-tuning support#102

Open
5ch4um1 wants to merge 1 commit into
Liquid4All:mainfrom
5ch4um1:add-lora
Open

feat(voice-assistant): add local LoRA fine-tuning support#102
5ch4um1 wants to merge 1 commit into
Liquid4All:mainfrom
5ch4um1:add-lora

Conversation

@5ch4um1

@5ch4um1 5ch4um1 commented May 13, 2026

Copy link
Copy Markdown

Summary

  • Local LoRA training: New --lora-rank / --lora-scaling flags on scripts/train.py with gradient checkpointing, enabling fine-tuning of LFM2.5-Audio-1.5B on GPUs with 7.5 GB VRAM (e.g. RTX 5050)
  • LoRA merge script: scripts/merge_lora_checkpoint.py merges LoRA weights back into standard Linear layers for downstream GGUF export
  • Local eval support: model_dir config field for evaluating locally quantized GGUFs without uploading to HuggingFace (modified _server.py, eval.py)
  • Bug fix: scripts/quantize.py uses shutil.which() instead of the which binary
  • Configs: configs/finetuned-local-q8.yaml and configs/finetuned-r32-q8.yaml for local eval runs
  • Docs: home-assistant README updated with local GPU LoRA tuning section

Results

With LoRA rank=32, 5000 steps on a local RTX 5050:

metric baseline LoRA rank-32
Format compliance 0.0% 100.0%
Function-name acc. 0.0% 92.9%
Argument acc. 0.0% 70.3%

Related

Full fine-tune on A100 reference: 99.0% function, 90.2% argument — LoRA r=32 closes most of the gap using only 1.5% of trainable parameters (~22M of 1.46B).

Add LoRA training capability to scripts/train.py via --lora-rank/--lora-scaling
flags and gradient checkpointing on the LFM backbone, enabling fine-tuning
on GPUs with as little as 7.5 GB VRAM.

New scripts/merge_lora_checkpoint.py merges LoRA weights back into standard
Linear layers for GGUF export.

Fix scripts/quantize.py check_build_tools() to use shutil.which() instead
of the which binary (which is a shell built-in on some systems).

Add model_dir support to _server.py and eval.py for evaluating locally
quantized GGUFs without uploading to HuggingFace.

Update home-assistant README to document local GPU LoRA tuning as an
alternative to Modal cloud training.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant