Skip to content

Commit 17e98d5

Browse files
authored
Merge branch 'main' into ideogram4-lora-loader
2 parents 7e73996 + a487e2f commit 17e98d5

11 files changed

Lines changed: 1427 additions & 195 deletions

File tree

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -591,6 +591,8 @@
591591
title: PixArt-Σ
592592
- local: api/pipelines/prx
593593
title: PRX
594+
- local: api/pipelines/prx_pixel
595+
title: PRX Pixel
594596
- local: api/pipelines/qwenimage
595597
title: QwenImage
596598
- local: api/pipelines/sana
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# PRX Pixel
16+
17+
PRXPixel is a pixel-space text-to-image generation model by Photoroom. A ~7B [`PRXTransformer2DModel`]
18+
denoises raw RGB images directly — no VAE is needed. The model is conditioned on a Qwen3-VL text encoder
19+
and uses flow matching where the transformer predicts the clean image at each step (x-prediction). The
20+
generation resolution is fed into the timestep modulation so the model is aware of the target size.
21+
22+
## Available models
23+
24+
| Model | Resolution | Description | Suggested parameters | Recommended dtype |
25+
|:-----:|:---------:|:----------:|:----------:|:----------:|
26+
| [`Photoroom/prxpixel-t2i`](https://huggingface.co/Photoroom/prxpixel-t2i) | 1024 | Pixel-space ~7B model with Qwen3-VL text encoder | 28 steps, cfg=5.0 | `torch.bfloat16` |
27+
28+
## Loading the pipeline
29+
30+
[`PRXPixelPipeline`] requires `transformers >= 4.57` (the version that introduced `Qwen3VLTextModel`). Load it with [`~DiffusionPipeline.from_pretrained`]:
31+
32+
```py
33+
import torch
34+
from diffusers import PRXPixelPipeline
35+
36+
pipe = PRXPixelPipeline.from_pretrained("Photoroom/prxpixel-t2i", torch_dtype=torch.bfloat16)
37+
pipe.to("cuda")
38+
39+
prompt = "A front-facing portrait of a lion in the golden savanna at sunset."
40+
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
41+
image.save("prxpixel_output.png")
42+
```
43+
44+
## Memory Optimization
45+
46+
For memory-constrained environments:
47+
48+
```py
49+
import torch
50+
from diffusers import PRXPixelPipeline
51+
52+
pipe = PRXPixelPipeline.from_pretrained("Photoroom/prxpixel-t2i", torch_dtype=torch.bfloat16)
53+
pipe.enable_model_cpu_offload()
54+
55+
# Or use sequential CPU offload for even lower memory
56+
pipe.enable_sequential_cpu_offload()
57+
```
58+
59+
## PRXPixelPipeline
60+
61+
[[autodoc]] PRXPixelPipeline
62+
- all
63+
- __call__
64+
65+
## PRXPipelineOutput
66+
67+
[[autodoc]] pipelines.prx.pipeline_output.PRXPipelineOutput

0 commit comments

Comments
 (0)