Fix kohya FLUX CLIP text-encoder LoRA loading under transformers>=5 (#13984)#14014
Closed
Alex-Wengg wants to merge 1 commit into
Closed
Conversation
transformers>=5 flattened CLIPTextModel (dropped the `text_model.` wrapper module), so `text_encoder.named_modules()` no longer carries that prefix while kohya-converted LoRA keys still do. The rank dict in `_load_lora_into_text_encoder` therefore came out empty and loading crashed with `IndexError` in get_peft_kwargs. Strip the stale `text_model.` prefix when the text encoder is flattened so the converted keys align with the module names. Adds a fast CPU regression test. Fixes huggingface#13984 Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #13984.
Loading a kohya-style FLUX LoRA that contains CLIP text-encoder weights (
lora_te1_*) crashes withIndexError: list index out of rangeundertransformers>=5:Root cause
transformers>=5flattenedCLIPTextModel: thetext_model.wrapper module was removed, sotext_encoder.named_modules()now yieldsencoder.layers.0.self_attn.k_projinstead oftext_model.encoder.layers.0.self_attn.k_proj.The kohya→diffusers conversion (
_convert_kohya_flux_lora_to_diffusers) still emits text-encoder keys prefixed withtext_model.(e.g.text_encoder.text_model.encoder.layers.0.self_attn.k_proj...). In_load_lora_into_text_encoder, therankdict is built by matchingnamed_modules()against the converted keys — under transformers 5 nothing matches,rankstays empty, andget_peft_kwargsdoeslist(rank_dict.values())[0]→IndexError.Fix
In
_load_lora_into_text_encoder, when the text encoder is flattened (has notext_modelsubmodule), strip the staletext_model.prefix from the converted state-dict keys so they align with the module names. The check is keyed on the actual model structure (hasattr(text_encoder, "text_model")), not a transformers version number, and is a no-op for encoders/LoRAs that don't carry the prefix (e.g. T5, native diffusers LoRAs).Tests
Added a fast CPU regression test
FluxLoRATests::test_kohya_clip_text_encoder_lora_loads_with_flattened_clip(the existing kohya+TE coverage is a@slow @nightly @require_big_acceleratorintegration test, so the regression went unnoticed). It builds a tiny flattenedCLIPTextModel, runs a synthetic kohya CLIP LoRA through the real converter + loader, and asserts the adapter is injected.main(IndexError) and passes with this fix.ruff check/ruff formatclean.Who can review?
@sayakpaul @BenjaminBossan