Skip to content

fix rf time scheduler problem#14011

Open
TheLovesOfLadyPurple wants to merge 1 commit into
huggingface:mainfrom
TheLovesOfLadyPurple:main
Open

fix rf time scheduler problem#14011
TheLovesOfLadyPurple wants to merge 1 commit into
huggingface:mainfrom
TheLovesOfLadyPurple:main

Conversation

@TheLovesOfLadyPurple

@TheLovesOfLadyPurple TheLovesOfLadyPurple commented Jun 20, 2026

Copy link
Copy Markdown

What does this PR do?

We fix a bug in the FlowMatchEulerDiscreteScheduler. If we call set_timesteps(num_inference_steps=2, timesteps=[1000. , 2.99401209]), the timesteps will be different from when we call set_timesteps(num_inference_steps=2), when the shift factor is not 1. However, its sigma list is equal.

More explicitly, before line 366 of scheduling_flow_match_euler_discrete.py, both sigmas, timesteps, self.config.num_train_timesteps, self.shift is identical, and the timesteps become different after line 366.

And in the Euler ODE/SDE solver designed for flow matching, the timestep only affects the input of the neural network; it doesn't affect the noisy level of the next step input/ the noisy level of this step's output.
In the code it's:
`

        sigmas = self.sigmas[:, None, None]
        lower_mask = sigmas < per_token_sigmas[None] - 1e-6
        lower_sigmas = lower_mask * sigmas
        lower_sigmas, _ = lower_sigmas.max(dim=0)

        current_sigma = per_token_sigmas[..., None]
        next_sigma = lower_sigmas[..., None]
        dt = current_sigma - next_sigma
    else:
        sigma_idx = self.step_index
        sigma = self.sigmas[sigma_idx]
        sigma_next = self.sigmas[sigma_idx + 1]

        current_sigma = sigma
        next_sigma = sigma_next
        dt = sigma_next - sigma

    if self.config.stochastic_sampling:
        x0 = sample - current_sigma * model_output
        noise = randn_tensor(sample.shape, generator=generator, device=sample.device, dtype=sample.dtype)
        prev_sample = (1.0 - next_sigma) * x0 + next_sigma * noise
    else:
        prev_sample = sample + dt * model_output`

But, as I have said, before line 366, the sigmas, timesteps, self.config.num_train_timesteps, self.shift is identical, and only the timesteps become different after line 366. So, in the inference loop, like the one given in line 1061 of pipeline_stable_diffusion_3.py, if the input timestep is not OOD in the automatic setting, the input timestep in the manual setting version will be OOD; at least one of them is wrong. That problem will appear when the user manually sets the inference loop. For example, like the situation that she want to follow the tutorial in the https://huggingface.co/docs/diffusers/using-diffusers/write_own_pipeline, or want to write a personal loop with using AFS in the paper: A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models.

Maybe someone will say such OOD is a feature, but if the user directly provides timesteps without providing a num_inference_steps, and no matter whether the timesteps are linear, such OOD will still happen, which means the timesteps array denotes a special noisy level sequence, but the t label input to the NN in the inference does not correspond to the noisy level. That is not a result expected for the user.

Reproduction

We call set_timesteps(timesteps=[1000,2.99401209]), the timesteps will be different from when we call set_timesteps(num_inference_steps=2), if the shift factor is not 1, But the noisy level given by sigma is identical.

`

accelerator = accelerate.Accelerator()
device = accelerator.device
if device.type != "cuda":
    raise RuntimeError("This script expects a CUDA device for Stable Diffusion 3 inference.")

seed_everything(14)
seeds = torch.randint(-2 ** 63, 2 ** 63 - 1, [accelerator.num_processes])
torch.manual_seed(seeds[accelerator.process_index].item())

dtype = torch.float16
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=dtype)
pipe.scheduler = FlowMatchEulerDiscreteScheduler.from_config(pipe.scheduler.config)

pipe.scheduler.set_timesteps(timesteps=[1000.        ,    2.99401209])
print(pipe.scheduler.timesteps.tolist())
print(pipe.scheduler.sigmas.tolist())

pipe.scheduler.set_timesteps(2)
print(pipe.scheduler.timesteps.tolist())
print(pipe.scheduler.sigmas.tolist())

`

Logs

[1000.0, 2.9940121173858643]
[1.0, 0.008928571827709675, 0.0]
[1000.0, 8.928571701049805]
[1.0, 0.008928571827709675, 0.0]

Fixes #14013

We fix the problem that the Euler Solver for Rectify-Flow has a wrong time schedule.

To prevent that mismatch of noisy level sigma and timestep t during inference, we modify the file, ensuring timesteps always correspond to the variant of the noise.

Who can review?

@yiyixuxu @dg845 @DN6

@github-actions github-actions Bot added size/S PR with diff < 50 LOC schedulers labels Jun 20, 2026
@TheLovesOfLadyPurple TheLovesOfLadyPurple changed the title fix rf-time scheduile problem fix rf-time scheduler problem Jun 20, 2026
@TheLovesOfLadyPurple TheLovesOfLadyPurple changed the title fix rf-time scheduler problem fix rf time scheduler problem Jun 20, 2026
@sayakpaul

Copy link
Copy Markdown
Member

Could you help us point out where this happens in the original SD3 code?

@TheLovesOfLadyPurple

Copy link
Copy Markdown
Author

Could you help us point out where this happens in the original SD3 code?

In the issue there: #14013 , I have now updated the code, and you will find out that, using two different set_timesteps methods, it will create the same sigma list, but the timesteps list that is input to the NN is different.

@TheLovesOfLadyPurple

TheLovesOfLadyPurple commented Jun 21, 2026

Copy link
Copy Markdown
Author

I now delete all of the complex mathematical proofs. For the mathematical proof that is based only on vector operation, it is:

For the linear interpolation, starting from x_0 and ending at epsilon, if there are multiple interpolation methods, then it can be described by
formula 1:
x_t = \alpha_{\sigma} x_0 + \sigma \epsilon
and formula 2:
\hat{x}t = \hat{\alpha}{\sigma} x_0 + \sigma \epsilon
and x_t - \hat{x}_t is not always a vector that only contains the element zero.

But since the interpolation is linear, x_t - \epsilon = s(\sigma)v, and \hat{x}t - \epsilon = s'(\sigma)v. where v is the deviation of the trajectory with respect to sigma, it's a vector, and s(t) is a scalar function.
and (s(\sigma)-s'(\sigma)) v = x_t - \hat{x}t = (\alpha{\sigma} - \hat{\alpha}
{\sigma}) x_0
v = \dot{\alpha}{\sigma} x_0 + \dot{\sigma} \epsilon , where \dot{\alpha}{\sigma} means the deviation of the \alpha with respect of sigma.

Since the interpolation starts from x_0 and ends at \epsilon, the \dot{\sigma} is not zero. If the \epsilon is not a zero vector. Then (s(\sigma)-s'(\sigma)) = 0 and (\alpha_{\sigma} - \hat{\alpha}_{\sigma}) = 0
And it violates the assumption that x_t - \hat{x}_t is not always a vector that only contains the element zero.

So, for the non-causal ODE in the training, we only have one formula given by the rectified flow, which is
(1-\sigma) x_0 + \sigma \epsilon.

And in the Euler ODE solver, dt = sigma_next - sigma or dt = current_sigma - next_sigma. If the timestep means align with the dt there, then we also only have one non-causal ODE with respect to t, which is
(1-t) x_0 + t \epsilon

@TheLovesOfLadyPurple

Copy link
Copy Markdown
Author

I now delete all of the complex mathematical proofs. For the mathematical proof that is based only on vector operation, it is:

For the linear interpolation, starting from x_0 and ending at epsilon, if there are multiple interpolation methods, then it can be described by formula 1: x_t = \alpha_{\sigma} x_0 + \sigma \epsilon and formula 2: \hat{x}t = \hat{\alpha}{\sigma} x_0 + \sigma \epsilon and x_t - \hat{x}_t is not always a vector that only contains the element zero.

But since the interpolation is linear, x_t - \epsilon = s(\sigma)v, and \hat{x}t - \epsilon = s'(\sigma)v. where v is the deviation of the trajectory with respect to sigma, and s(t) is a scalar function. and (s(\sigma)-s'(\sigma)) v = x_t - \hat{x}t = (\alpha{\sigma} - \hat{\alpha}{\sigma}) x_0 v = \dot{\alpha}_{\sigma} x_0 + \dot{\sigma} \epsilon

If the \epsilon is not a zero vector. Then (s(\sigma)-s'(\sigma)) = 0 and (\alpha_{\sigma} - \hat{\alpha}_{\sigma}) = 0 And it violates the assumption that x_t - \hat{x}_t is not always a vector that only contains the element zero.

So, for the non-causal ODE in the training, we only have one formula given by the rectified flow, which is (1-\sigma) x_0 + \sigma \epsilon.

And in the Euler ODE solver, dt = sigma_next - sigma or dt = current_sigma - next_sigma. If the timestep means align with the dt there, then we also only have one non-causal ODE with respect to t, which is (1-t) x_0 + t \epsilon

So, in the SD3 paper, which denote the new sigmas is: sigmas = self.shift * sigmas / (1 + (self.shift - 1) * sigmas)
The final and start sigmas are still 1 and 0, so we still satisfy the premise that the non-causal ODE starts from x_0 and points to epsilon. Meanwhile, in the inference, x0 = sample - current_sigma * model_output means it's linear.

So the only possible timesteps should still align with sigmas, if the timesteps provide a series t that corresponds to the dt in the ODE solver.

@TheLovesOfLadyPurple

TheLovesOfLadyPurple commented Jun 21, 2026

Copy link
Copy Markdown
Author

I now delete all of the complex mathematical proofs. For the mathematical proof that is based only on vector operation, it is:
For the linear interpolation, starting from x_0 and ending at epsilon, if there are multiple interpolation methods, then it can be described by formula 1: x_t = \alpha_{\sigma} x_0 + \sigma \epsilon and formula 2: \hat{x}t = \hat{\alpha}{\sigma} x_0 + \sigma \epsilon and x_t - \hat{x}t is not always a vector that only contains the element zero.
But since the interpolation is linear, x_t - \epsilon = s(\sigma)v, and \hat{x}t - \epsilon = s'(\sigma)v. where v is the deviation of the trajectory with respect to sigma, and s(t) is a scalar function. and (s(\sigma)-s'(\sigma)) v = x_t - \hat{x}t = (\alpha{\sigma} - \hat{\alpha}{\sigma}) x_0 v = \dot{\alpha}
{\sigma} x_0 + \dot{\sigma} \epsilon
If the \epsilon is not a zero vector. Then (s(\sigma)-s'(\sigma)) = 0 and (\alpha_{\sigma} - \hat{\alpha}_{\sigma}) = 0 And it violates the assumption that x_t - \hat{x}_t is not always a vector that only contains the element zero.
So, for the non-causal ODE in the training, we only have one formula given by the rectified flow, which is (1-\sigma) x_0 + \sigma \epsilon.
And in the Euler ODE solver, dt = sigma_next - sigma or dt = current_sigma - next_sigma. If the timestep means align with the dt there, then we also only have one non-causal ODE with respect to t, which is (1-t) x_0 + t \epsilon

So, in the SD3 paper, which denote the new sigmas is: sigmas = self.shift * sigmas / (1 + (self.shift - 1) * sigmas) The final and start sigmas are still 1 and 0, so we still satisfy the premise that the non-causal ODE starts from x_0 and points to epsilon. Meanwhile, in the inference, x0 = sample - current_sigma * model_output means it's linear.

So the only possible timesteps should still align with sigmas, if the timesteps provide a series t that corresponds to the dt in the ODE solver.

And hence, according to my understanding, the SD3 actually does not provide a new interpolation method but a new method to discretize the ODE.

For example, for an interpolation method given by the generalized RF, it can be described by a formula: \alpha_t x_1 + \beta_t \epsilon. At the same timestep t, when we choose a different interpolation method, we have a different variance \beta_t, and a different SNR. But we also have a discrete method, which guide the inference/diffusion progress. For example, the first step complete 10% of the inference, and the second step complete 25% of the inference. And in such a situation, for the same interpolation method, if we choose a different discrete method, we will have a different SNR in the same inference step. For example, for a linear interpolation in low dimensions, the discrete method may not be a serious problem, but some of the curve may have a straight part and a curved part, in such a situation, the discrete method really matters. And different interpolations will have different trajectories; some of them are straight.

SD3 doesn't provide a new interpolation method, but a new discrete method.

@TheLovesOfLadyPurple

TheLovesOfLadyPurple commented Jun 21, 2026

Copy link
Copy Markdown
Author

Could you help us point out where this happens in the original SD3 code?

So, more explicitly:
When it comes to coding, when you call the SD3 in an inference loop with only inputting the prompt, height, and width to the NN, and use a different method to set the timesteps ( set the timesteps or set the num inference steps ) via the scheduler.

Then, during inference, we might have a different sequence of timesteps. But for the ODE/SDE solver itself, when it calls the step function, the noisy level of this step's output/next step's input is identical.

That means at least one timestep setting method provides a t-label sequence that is OOD during inference. More precisely, in the inference, a noise feature with special SNR will be paired with that t-label to input to the NN, and that inference t-SNR pair is different from the training t-SNR pair.

And when it comes to mathematics, when the dt in the solver means the gap between t-labels in the timesteps list, the only correct continuous t-label is equal to the sigmas.

@github-actions

Copy link
Copy Markdown
Contributor

Hi @TheLovesOfLadyPurple, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

schedulers size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A bug on the set_timesteps function of the FlowMatchEulerDiscreteScheduler

2 participants