Model/Pipeline/Scheduler description
We are proposing the integration of SeFi-Image into Diffusers.
SeFi-Image is a text-to-image model family built with Semantic-First Diffusion. It separates image generation into semantic and texture latent streams, denoising the semantic structure slightly ahead of texture details so the texture stream receives a cleaner structural anchor during generation.
Key characteristics:
- Semantic-first generation: separates semantic and texture latents and advances the semantic denoising stream to improve structural consistency.
- Generation-reconstruction trade-off: combines a compact semantic latent for easier generation with a high-fidelity texture latent for reconstruction detail.
- Multiple model scales and variants: 1B/2B/5B Base checkpoints, 5B RL checkpoint, and 1B/2B/5B Turbo checkpoints.
- Few-step Turbo inference: Turbo checkpoints default to 4 denoising steps with guidance scale 1.0.
- Standard text-to-image usage: prompt-to-image generation at 1024x1024 by default.
The proposed Diffusers integration adds:
SeFiTransformer2DModel
SeFiPipeline
- an original-checkpoint conversion script
- API documentation
- fast model and pipeline tests
The current implementation was validated with the real SeFi-Image/SeFi-Image-1B-turbo checkpoint, including conversion, 1024x1024 inference parity against a reference output, and CPU offload smoke testing.
Open source status
Provide useful links for the implementation
Additional context
This is an official integration proposal from the SeFi-Image authors/maintainers. The draft PR is already open with the implementation and validation results, and can be adjusted based on maintainer feedback on the desired scope or API shape.
Model/Pipeline/Scheduler description
We are proposing the integration of SeFi-Image into Diffusers.
SeFi-Image is a text-to-image model family built with Semantic-First Diffusion. It separates image generation into semantic and texture latent streams, denoising the semantic structure slightly ahead of texture details so the texture stream receives a cleaner structural anchor during generation.
Key characteristics:
The proposed Diffusers integration adds:
SeFiTransformer2DModelSeFiPipelineThe current implementation was validated with the real
SeFi-Image/SeFi-Image-1B-turbocheckpoint, including conversion, 1024x1024 inference parity against a reference output, and CPU offload smoke testing.Open source status
Provide useful links for the implementation
Additional context
This is an official integration proposal from the SeFi-Image authors/maintainers. The draft PR is already open with the implementation and validation results, and can be adjusted based on maintainer feedback on the desired scope or API shape.