Complete Kohya LoRA conversion for Qwen and Z-Image (#14080)

dxqb · claude · sayakpaul · web-flow · commit 6d71b76aceff · 2026-06-30T20:02:49.000+03:00
* Fix Kohya LoRA conversion for Z-Image modules whose names contain underscores

_convert_non_diffusers_z_image_lora_to_diffusers reverses Kohya's `.`-&gt;`_`
flattening with a blanket `_`-&gt;`.` split, guarded only by a small
protected-n-gram list (attention to_q/k/v/out, feed_forward) plus post-hoc
fixes for context_refiner/noise_refiner. Z-Image's other modules whose names
contain underscores were over-split: all_final_layer, all_x_embedder,
adaLN_modulation, cap_embedder and t_embedder came out as all.final.layer,
adaLN.modulation, ... and failed to load with "unexpected keys".

Extend the existing dot-&gt;underscore post-normalization to re-merge these
names, so Kohya (lora_unet_) Z-Image LoRAs load.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;

* Fix Kohya LoRA conversion for Qwen top-level (non-block) modules

_convert_non_diffusers_qwen_lora_to_diffusers's convert_key hardcodes the
transformer_blocks prefix and assumes every lora_unet_ key lives under a block:
it strips a transformer_blocks_ prefix and re-prepends transformer_blocks.,
which collapses the top-level modules (img_in, txt_in, proj_out, norm_out.linear,
time_text_embed.timestep_embedder.linear_1/2) onto each other. They end up as
transformer_blocks..weight / ...a.down.weight and trip the 'state_dict should be
empty' guard.

Resolve these six modules via an explicit flattened-&gt;dotted map before the block
logic runs, preserving the .lora_down/.lora_up/.alpha suffix, so Kohya (lora_unet_)
Qwen LoRAs load.

Co-Authored-By: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.8 &lt;noreply@anthropic.com&gt;
Co-authored-by: Sayak Paul &lt;spsayakpaul@gmail.com&gt;
diff --git a/src/diffusers/loaders/lora_conversion_utils.py b/src/diffusers/loaders/lora_conversion_utils.py
@@ -2232,8 +2232,26 @@ def _convert_non_diffusers_qwen_lora_to_diffusers(state_dict):
     if has_lora_unet:
         state_dict = {k.removeprefix("lora_unet_"): v for k, v in state_dict.items()}
 
+        # Top-level (non-block) modules: convert_key below assumes every key lives under
+        # transformer_blocks_ and blindly strips/re-prepends that prefix, which collapses
+        # these module names onto each other. Map them explicitly before that logic runs.
+        # The flattened name -> dotted diffusers name is fixed, and the .lora_down/.lora_up/
+        # .alpha suffix is preserved.
+        top_level_modules = {
+            "img_in": "img_in",
+            "txt_in": "txt_in",
+            "proj_out": "proj_out",
+            "norm_out_linear": "norm_out.linear",
+            "time_text_embed_timestep_embedder_linear_1": "time_text_embed.timestep_embedder.linear_1",
+            "time_text_embed_timestep_embedder_linear_2": "time_text_embed.timestep_embedder.linear_2",
+        }
+
         def convert_key(key: str) -> str:
             prefix = "transformer_blocks"
+            for flat, dotted in top_level_modules.items():
+                if key == flat or key.startswith(flat + "."):
+                    return dotted + key[len(flat) :]
+
             if "." in key:
                 base, suffix = key.rsplit(".", 1)
             else:
@@ -2803,12 +2821,27 @@ def normalize_out_key(k: str) -> str:
         state_dict = {k.replace("default.", ""): v for k, v in state_dict.items()}
 
     # Normalize ZImage-specific dot-separated module names to underscore form so they
-    # match the diffusers model parameter names (context_refiner, noise_refiner).
-    state_dict = {
-        k.replace("context.refiner.", "context_refiner.").replace("noise.refiner.", "noise_refiner."): v
-        for k, v in state_dict.items()
+    # match the diffusers model parameter names. convert_key blindly split every "_",
+    # so module names whose own names contain underscores (and aren't protected as the
+    # attention/feed_forward n-grams are) come out over-split here. This runs on the full
+    # key (before the weight/alpha handlers below) so it fixes .lora_A/B and .alpha alike.
+    zimage_module_name_fixups = {
+        "context.refiner.": "context_refiner.",
+        "noise.refiner.": "noise_refiner.",
+        "adaLN.modulation.": "adaLN_modulation.",
+        "all.final.layer.": "all_final_layer.",
+        "all.x.embedder.": "all_x_embedder.",
+        "cap.embedder.": "cap_embedder.",
+        "t.embedder.": "t_embedder.",
     }
 
+    def fixup_module_names(k: str) -> str:
+        for dotted, underscored in zimage_module_name_fixups.items():
+            k = k.replace(dotted, underscored)
+        return k
+
+    state_dict = {fixup_module_names(k): v for k, v in state_dict.items()}
+
     converted_state_dict = {}
     all_keys = list(state_dict.keys())
     down_key = ".lora_down.weight"