[WIP] feature/PrefGRPO-txt2img-cleaning #9

LouisRouss · 2025-09-10T20:32:30Z

Adds PrefGRPO training to Diffulab.
The implementation is done from scratch but I take inspiration from the official repository.
Tons of work left to do, draft PR just to open the subject.

This PR also:

correct GaussianDiffusion class and introduces the Sampler abstractions
serves as a test and fix for the text to image logic. I hadn't had the time to fix everything on the master for this text code that was coded long time ago and not tested.

…g and reward computation - Allow batch processing with multiple prompts

…resentation output

…rts for clarity

…ility updates

…upport and advantage computation

…asses for GRPO alignment

…osses and calculate the mean at the end

…uler and EulerMaruyama methods for flow based models

…put_size; add PreComputedEmbedder class for handling precomputed embeddings; update SD3TextEmbedder to streamline initialization and embedding retrieval.

…mprove type casting for image and text feature encoding.

… timestep handling in Euler and EulerMaruyama samplers.

…ting methods - Fix Gaussian Diffusion in general

…ng and logging

…ruyama sampler

…asses to remove ModelInputGRPO and standardize on ModelInput, enhancing type consistency and clarity. - Refactor GRPOTrainer to take into account sampler use and grpo function merged into basic ones

…er abstract class

…n and improve parameter handling

…fusion classes to make data_shape optional, enhancing flexibility in model input handling.

- typing fix - set context embeder to val mode and desactivate grad computation

…ditional tensor ranges and raise errors for unsupported ranges.

…ed flexibility and context handling

…sk to embedding outputs + fix part of the code

…flect reverse flow matching process

Add RewardModel and PrefGRPORewardModel classes for reward computation

91de4b3

LouisRouss marked this pull request as draft September 10, 2025 20:33

LouisRouss added 28 commits September 11, 2025 22:50

Refactor RewardModel and PrefGRPORewardModel to enhance image handlin…

db14e8a

…g and reward computation - Allow batch processing with multiple prompts

Add return_latents option to Diffuser's denoise method for latent rep…

02e1ca8

…resentation output

Add attribute delegation and enhanced dir() support to Diffuser class

2ff6348

Fix dtype argument in model initialization

b54d0c7

Add one_step_denoise_grpo method for GRPO training in Flow class

7e60ff7

Refactor training classes to use a common trainer and reorganize impo…

b0240f0

…rts for clarity

Add GRPO support to Diffuser and Flow classes with new methods and ut…

f60fd65

…ility updates

Enhance RewardModel and PrefGRPORewardModel with n_image_per_prompt s…

2a3ffc8

…upport and advantage computation

Add GRPO support with new BatchData structures and update training cl…

43f6cd3

…asses for GRPO alignment

fix typing

6b014f7

fix loss calculation grpo flow

fd4252a

Refactor loss computation in Flow class to use a list for step-wise l…

ba8f074

…osses and calculate the mean at the end

Refactor trainer imports and implement validation step in GRPOTrainer

10a823d

Finish GRPO training loop and fix epoch level scheduler logic

7519c8c

Add clip in reward model

2f1940e

Implement StepResult and Sampler classes for diffusion process; add E…

d442082

…uler and EulerMaruyama methods for flow based models

adapt to abstraction sampler and clean GRPO logic

693e403

Refactor ContextEmbedder to implement properties for n_output and out…

342982a

…put_size; add PreComputedEmbedder class for handling precomputed embeddings; update SD3TextEmbedder to streamline initialization and embedding retrieval.

Refactor PrefGRPORewardModel to standardize clip model ID usage and i…

f7c4200

…mprove type casting for image and text feature encoding.

Refactor sampler classes to standardize set_steps method for improved…

dc033ab

… timestep handling in Euler and EulerMaruyama samplers.

Add DDIM and DDPM sampler implementations with step and parameter set…

8f17901

…ting methods - Fix Gaussian Diffusion in general

Refactor Flow and EulerMaruyama classes for improved parameter handli…

2a9847a

…ng and logging

improve tensor handling and device compatibility in flow and euler me…

c64db67

…ruyama sampler

- Refactor model input handling in Diffuser, Flow, and GRPOTrainer cl…

2d476d4

…asses to remove ModelInputGRPO and standardize on ModelInput, enhancing type consistency and clarity. - Refactor GRPOTrainer to take into account sampler use and grpo function merged into basic ones

Add a generic abstract sampler class over modelization specific sampl…

d2425fb

…er abstract class

Refactor diffusion model classes to standardize sampler initializatio…

0b9e159

…n and improve parameter handling

update docstring

3c81ce9

Refactor denoise method signatures in Diffuser, Flow, and GaussianDif…

854d82a

…fusion classes to make data_shape optional, enhancing flexibility in model input handling.

LouisRouss added 11 commits September 28, 2025 12:21

Allow MMDiT to use a context embedder without pooled embedding

89d8c90

- Add loguru dependency

d079027

- typing fix - set context embeder to val mode and desactivate grad computation

Refactor preprocess method in DinoV2

4ae13d2

Enhance input validation in encode method of DCAE class to support ad…

8838fba

…ditional tensor ranges and raise errors for unsupported ranges.

Implement DDT architecture and refactor modulation classes for enhanc…

ee3d906

…ed flexibility and context handling

add dinoV3 and precompute functions

562f1f3

Refactor SD3TextEmbedder to improve type casting and add attention ma…

2b8a642

…sk to embedding outputs + fix part of the code

Update step method docstring in Euler and EulerMaruyama classes to re…

5c24b79

…flect reverse flow matching process

add dependencies

20945ec

improve attn unet

9960d67

Refactor ContextEmbedder to use ContextEmbedderOutput for forward method

94c9c44

LouisRouss changed the title ~~[WIP] feature/PrefGRPO~~ [WIP] feature/PrefGRPO-txt2img-cleaning Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] feature/PrefGRPO-txt2img-cleaning #9

[WIP] feature/PrefGRPO-txt2img-cleaning #9

Uh oh!

LouisRouss commented Sep 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] feature/PrefGRPO-txt2img-cleaning #9

Are you sure you want to change the base?

[WIP] feature/PrefGRPO-txt2img-cleaning #9

Uh oh!

Conversation

LouisRouss commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LouisRouss commented Sep 10, 2025 •

edited

Loading