Skip to content

Fix kohya FLUX CLIP text-encoder LoRA loading under transformers>=5 (#13984)#14014

Closed
Alex-Wengg wants to merge 1 commit into
huggingface:mainfrom
Alex-Wengg:fix/13984-kohya-clip-lora-flattened-transformers5
Closed

Fix kohya FLUX CLIP text-encoder LoRA loading under transformers>=5 (#13984)#14014
Alex-Wengg wants to merge 1 commit into
huggingface:mainfrom
Alex-Wengg:fix/13984-kohya-clip-lora-flattened-transformers5

Conversation

@Alex-Wengg

Copy link
Copy Markdown

What does this PR do?

Fixes #13984.

Loading a kohya-style FLUX LoRA that contains CLIP text-encoder weights (lora_te1_*) crashes with IndexError: list index out of range under transformers>=5:

File ".../diffusers/utils/peft_utils.py", line 158, in get_peft_kwargs
    r = lora_alpha = list(rank_dict.values())[0]
IndexError: list index out of range

Root cause

transformers>=5 flattened CLIPTextModel: the text_model. wrapper module was removed, so text_encoder.named_modules() now yields encoder.layers.0.self_attn.k_proj instead of text_model.encoder.layers.0.self_attn.k_proj.

The kohya→diffusers conversion (_convert_kohya_flux_lora_to_diffusers) still emits text-encoder keys prefixed with text_model. (e.g. text_encoder.text_model.encoder.layers.0.self_attn.k_proj...). In _load_lora_into_text_encoder, the rank dict is built by matching named_modules() against the converted keys — under transformers 5 nothing matches, rank stays empty, and get_peft_kwargs does list(rank_dict.values())[0]IndexError.

Fix

In _load_lora_into_text_encoder, when the text encoder is flattened (has no text_model submodule), strip the stale text_model. prefix from the converted state-dict keys so they align with the module names. The check is keyed on the actual model structure (hasattr(text_encoder, "text_model")), not a transformers version number, and is a no-op for encoders/LoRAs that don't carry the prefix (e.g. T5, native diffusers LoRAs).

Tests

Added a fast CPU regression test FluxLoRATests::test_kohya_clip_text_encoder_lora_loads_with_flattened_clip (the existing kohya+TE coverage is a @slow @nightly @require_big_accelerator integration test, so the regression went unnoticed). It builds a tiny flattened CLIPTextModel, runs a synthetic kohya CLIP LoRA through the real converter + loader, and asserts the adapter is injected.

$ python -m pytest tests/lora/test_lora_layers_flux.py::FluxLoRATests \
    -k "kohya_clip or text_lora"
10 passed
  • The new test fails on main (IndexError) and passes with this fix.
  • ruff check / ruff format clean.

AI assistance (Claude) was used to investigate and draft this change; the diff and tests were reviewed and run locally.

Who can review?

@sayakpaul @BenjaminBossan

transformers>=5 flattened CLIPTextModel (dropped the `text_model.` wrapper
module), so `text_encoder.named_modules()` no longer carries that prefix while
kohya-converted LoRA keys still do. The rank dict in `_load_lora_into_text_encoder`
therefore came out empty and loading crashed with `IndexError` in get_peft_kwargs.

Strip the stale `text_model.` prefix when the text encoder is flattened so the
converted keys align with the module names. Adds a fast CPU regression test.

Fixes huggingface#13984

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FLUX LoRA with CLIP text-encoder weights fails (empty rank -> IndexError) under transformers>=5

1 participant