Fix kohya FLUX CLIP text-encoder LoRA loading under transformers>=5 (#13984) by Jefsky · Pull Request #14029 · huggingface/diffusers

Jefsky · 2026-06-21T20:11:32Z

Root cause

transformers>=5 flattened CLIPTextModel: the text_model. wrapper module was removed, so text_encoder.named_modules() now yields names like encoder.layers.0.self_attn.k_proj instead of text_model.encoder.layers.0.self_attn.k_proj.

The kohya→diffusers conversion still emits text-encoder keys prefixed with text_model. (e.g. text_encoder.text_model.encoder.layers.0.self_attn.k_proj.lora_B.weight). In _load_lora_into_text_encoder, the rank dict is built by matching named_modules() against the converted state-dict keys — under transformers>=5 nothing matches, rank stays empty, and get_peft_kwargs does list(rank_dict.values())[0] → IndexError.

The PEFT-side fix (#3212) doesn't help here because the crash happens before any PEFT state-dict injection (confirmed by the issue reporter).

Fix

In _load_lora_into_text_encoder, after the convert_state_dict_to_peft call, strip the stale text_model. prefix from the converted state-dict keys when the text encoder doesn't have the text_model submodule (i.e. the transformers>=5 layout). The check uses hasattr(text_encoder, "text_model"), not a version number, so it's forward-compatible.

Test

Added FluxLoRATests::test_kohya_clip_text_encoder_flattened_compat — a fast CPU regression test that:

Builds the dummy FLUX pipeline
Removes the text_model attribute from text_encoder (simulating transformers>=5)
Passes a synthetic kohya-style state dict with the stale text_model. prefix
Asserts the adapter is correctly injected (no IndexError)

Verification

Before fix: the synthetic state dict with text_model. prefix causes IndexError: list index out of range in get_peft_kwargs
After fix: the adapter loads correctly on both flattened (transformers>=5) and traditional (transformers<5, hasattr(text_model)=True) layouts

…uggingface#13984) Under transformers>=5, CLIPTextModel was flattened: the text_model. wrapper module was removed, so named_modules() returns unprefixed names like 'encoder.layers.0.self_attn.k_proj' instead of 'text_model.encoder.layers.0.self_attn.k_proj'. Kohya-sourced LoRA state dict keys still carry the stale 'text_model.' prefix after conversion, causing _load_lora_into_text_encoder to build an empty rank dict (nothing matches) and crash with IndexError in get_peft_kwargs. Fix: after the PEFT state dict conversion, strip 'text_model.' from state dict keys when the encoder doesn't have the text_model submodule (the transformers>=5 layout), so they align with named_modules() output. Added a regression test test_kohya_clip_text_encoder_flattened_compat that simulates the flattened CLIPTextModel layout and passes synthetic kohya-style keys.

BenjaminBossan

Thanks for the PR. I agree that it should fix the specific issue that was mentioned and that should be safe to apply. My concern is that it is very specialized to this case and not a general solution. The PR covers this exact case:

https://github.com/huggingface/transformers/blob/bfd3604d83e84d7ff8bbc18bc09c21e8282d31f9/src/transformers/conversion_mapping.py#L608

But there could be other entries in the conversion mapping, now or added in the future, which are not covered by this patch. So I wonder if we should instead call Transformers get_model_conversion_mapping on the model (if it's a Transformers model) and apply the conversions from there.

I'll leave it up to the Diffusers maintainers to decide how to deal with this.

Jefsky mentioned this pull request Jun 21, 2026

FLUX LoRA with CLIP text-encoder weights fails (empty rank -> IndexError) under transformers>=5 #13984

Open

github-actions Bot added size/M PR with diff < 200 LOC lora tests fixes-issue and removed size/M PR with diff < 200 LOC labels Jun 21, 2026

BenjaminBossan reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix kohya FLUX CLIP text-encoder LoRA loading under transformers>=5 (#13984)#14029

Fix kohya FLUX CLIP text-encoder LoRA loading under transformers>=5 (#13984)#14029
Jefsky wants to merge 1 commit into
huggingface:mainfrom
Jefsky:fix/kohya-flux-clip-te-lora-transformers5

Jefsky commented Jun 21, 2026

Uh oh!

BenjaminBossan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jefsky commented Jun 21, 2026

Root cause

Fix

Test

Verification

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants