Add generate_streaming for Streaming Audio Generation by balalofernandez · Pull Request #262 · nari-labs/dia

balalofernandez · 2025-07-22T11:01:33Z

This PR introduces the generate_streaming function, which enables streaming audio generation. As soon as audio tokens are generated by the model, the vocoder is run on the entire sequence (using the full sentence as context for best quality), and only the newly generated audio chunk is yielded. The implementation closely mirrors the existing generate function, but is kept as a separate function (without extracting shared logic) to make the review process easier.

Hopefully adding support for #11, #93, #237, making it usable in conversational use cases #181 and accelerating ttfb #153.

buttercrab · 2025-07-22T15:30:37Z

+                    )
+                start_time = time.time()
+
+            if current_step_idx - last_yield_step >= chunk_size:


Due to delay patterns, the first chunk is smaller than other chunks

buttercrab · 2025-07-22T15:31:15Z

+                for i in range(batch_size):
+                    generated_codes[i, : total_lens[i], :] = all_tokens[i]
+                lengths_Bx = torch.tensor(total_lens, device=self.device)
+                audio_chunks = self._generate_output(generated_codes, lengths_Bx)


This is inefficient. Only process newly generate tokens.

I noticed some artifacts when passing only the new tokens to the vocoder.

Okay can you fix the delay pattern problem?

Siddharth0207 · 2025-08-19T04:20:33Z

Thanks a ton !

wwang1110 · 2025-08-22T21:43:34Z

Thanks, when will this feature be release?

nlpkiddo-2001 · 2025-09-24T07:11:17Z

When Can we expect this feature?

amal5haji · 2025-12-26T17:32:52Z

any update?

balalofernandez added 13 commits July 15, 2025 13:38

First changes

90535c9

Works without overlap

a6c7428

Remove _process_audio_chunk

64dcc46

Give more context to the vocoder

0643a80

Simplify streaming example

13f2192

Removed unnecessary dependencies

05db019

Add tokens/second metric

a80e2e3

Remove prefilled part

ef51c58

Allow batch processing

0ee015d

Remove old dependencies

213fe31

Generate a dialog with voice clone

ffd5dff

Fix unused dependencies

ccecf75

Fix ruff

24ab5c2

buttercrab suggested changes Jul 22, 2025

View reviewed changes

Awaiskhan404 approved these changes Aug 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generate_streaming for Streaming Audio Generation#262

Add generate_streaming for Streaming Audio Generation#262
balalofernandez wants to merge 13 commits into
nari-labs:mainfrom
balalofernandez:streaming-implementation

balalofernandez commented Jul 22, 2025

Uh oh!

buttercrab Jul 22, 2025

Uh oh!

buttercrab Jul 22, 2025

Uh oh!

balalofernandez Jul 22, 2025

Uh oh!

buttercrab Jul 23, 2025

Uh oh!

Siddharth0207 commented Aug 19, 2025

Uh oh!

wwang1110 commented Aug 22, 2025

Uh oh!

nlpkiddo-2001 commented Sep 24, 2025

Uh oh!

amal5haji commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

balalofernandez commented Jul 22, 2025

Uh oh!

buttercrab Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

buttercrab Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

balalofernandez Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

buttercrab Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Siddharth0207 commented Aug 19, 2025

Uh oh!

wwang1110 commented Aug 22, 2025

Uh oh!

nlpkiddo-2001 commented Sep 24, 2025

Uh oh!

amal5haji commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants