-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Add support for Hunyuan Image 3.0 #10576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add support for Hunyuan Image 3.0 #10576
Conversation
|
Nice! Will take a look 👍 |
|
Thank you, @yousef-rafat, you are a GigaChad! What are the VRAM/RAM requirements currently + speed per iteration? Does it support quantization? |
|
At least 6 Gigabytes for the non-MoE part. The MoE part will depend on the amount of VRAM. The lower the VRAM, the longer the time takes to generate increases exponentially. I'm still experimenting with running the model on a single gpu, though, so no final numbers. |
|
Will you implement some ncpu offload mechanism like in llama.cpp to only keep active parameters in vram and make the other offloadable to cpu/system ram? Not that you have to do it, support itself is amazing already, though if that is your plan, make sure to try and contact the guy from the multigpu node, he knows a lot of memory management in comfyui (; |
|
@yousef-rafat How to playtest this, can you give some instructions and the workflow? Do you have a ComfyUI compatible checkpoint or it's raw model for now? |
| try: | ||
| h = h + emb_out | ||
| except: | ||
| emb_out = emb_out.movedim(1, 2) | ||
| h = h + emb_out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can break compilation, and looks unsafe in general. Is it possible for you to add an explicit check?
Added an async loading and offloading of moe layers, having consistent memory with oom errors. Used to give oom error after the third layer with 24 giga bytes gpu, now goes to the end with consistent memory with minimal latency
|
How do you convert the original Hunyuan image 3 split checkpoint into the loadable ComfyUI format? |
|
@kabachuha Sorry for the late response, been busy. I usually start with testing the model in a comfyui format independently, just to make sure everything is working right as it should, before testing it in comfyui. This is the script I'm working with. The checkpoint conversion isn't hard once I know how everything should go. I could update on the checkpoint loading part soon. |
|
I added comfyui nodes and a method to quantize for NF4 in my repo here https://github.com/EricRollei/Comfy_HunyuanImage3 which is available through the comfyui-manager. Hunyuan 3 will choose what resolution automatically based on the prompt and the VRAM needed is really res dependent so if you have vram constraints select a lower res for generation. I also have an unload weights node so you can clear the vram after generation if you want to upscale or do other things in the workflow. But even with nF4 quantization you still need a ton of vram to skip offloading GPU. |
No description provided.