Skip to content

Conversation

phil-opp
Copy link
Collaborator

As proposed in #1093 (review)

@phil-opp phil-opp requested a review from haixuanTao August 13, 2025 12:55
@phil-opp phil-opp force-pushed the start-dataflow-through-daemon branch from e5466ef to 115e7fc Compare August 13, 2025 13:00
@haixuanTao
Copy link
Collaborator

Hey Philipp, I understand that you might not want to expose to many public function, but, I think that it would be a better solution for now.

I genuinely don't feel comfortable adding 300+ changes that touches all the repository and adds on top of an already oversaturated chain of request-response between the node-daemon-coordinator.

I really feel the burden of adding those fixes that makes the code very hard to read and is also hard to maintain.

I'm not sure if this will be a breaking changes that will requires to bump all dora node.

I think that this change is interesting and we should probably integrate this in a broader refactoring that makes it possible for library based running dataflow, and enable things like testing?

How does that sound?

@phil-opp
Copy link
Collaborator Author

The main limitation that I'm trying to solve with this PR is to make the changes from #1093 also work when the coordinator is running on a different machine than the node. I think the nodes in our node hub should be usable in distributed dataflows too.

I genuinely don't feel comfortable adding 300+ changes

Most of these changes are just moving some functions around, e.g. from the CLI crate to our core crate. The main change in the coordinator is to put the ControlRequest::Start handling into a separate function so that we can reuse it for the new CoordinatorRequest::StartDataflow request.

adds on top of an already oversaturated chain of request-response between the node-daemon-coordinator.

I agree that this part needs some refactoring. However, I'm not sure if it's really simpler to do add the CLI into the mix. Then we have a cli1->coordinator->daemon->node->cli2->coordinator->daemon chain...

I'm not sure if this will be a breaking changes that will requires to bump all dora node.

The communication is initiated by the node, so you should only need to update the dora version if you want to use it. The existing messages are not changed.

@phil-opp
Copy link
Collaborator Author

Another advantage of this approach is that your dora-openai-websocket server would be much easier to compile, as you don't have to build the coordinator, daemon, all the CLI functionality (e.g. git and https download), the runtime, etc this way.

@phil-opp
Copy link
Collaborator Author

However, I don't feel too strongly about this. I'm also ok with merging #1093 as-is, if you're fine with the big dependency chain and the fact that it only works for same-machine deployments.

@haixuanTao haixuanTao force-pushed the make-qwen-llm-configurable branch 2 times, most recently from afd751b to 1a626ad Compare August 29, 2025 09:25
@phil-opp
Copy link
Collaborator Author

phil-opp commented Sep 3, 2025

Closing for now because of the concerns mentioned in #1101 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants