Refactor scheduler and add tests for ComplieSim deployment #45

machshev · 2025-10-29T10:10:24Z

These are partial steps towards the goal of converting all the deployment objects to Pydantic models - which would provide the first complete and clear statically defined interface within the flow. From there it's a lot easier to work backwards and refactor the rest of the flow.

Main steps in this PR:

Make all the deployment objects attributes deterministic - prior to this change, the cov_db_dirs field order changed semi randomly. Meaning it takes manual review to determine if the json file generated with DVSIM_DEPLOY_DUMP=true changed due to reordering or an actual regression. Now a sha1 change is enough to warn of a breaking change.
Renamed model_dump to dump as the pydantic version contains intermediary fields and working data. While the dump script only exports the fields read by the scheduler and so affect the jobs that run. Once all deployment objects have been refactored into Pydantic then we could probably go back to model_dump or potentially have a single model of deployable Job to feed to the scheduler and vary only in the factory functions to generate them. Fundamentally the output is to the scheduler is little more than a "command to run", and environment variables to set in the subshell.
Add WorkspaceCfg to package up the values that were otherwise being read from the Deploy object. This means the Deploy object doesn't need to be passed around so much. This isn't necessarily the ideal end point and may disappear in later refactoring. However for the moment it's a step towards clearer interfaces and separation between data and behaviour.
Initial CompileSim unit tests - capturing the construction of the key deployment outputs consumed by the scheduler. This is to provide a harness for later refactoring.
Various small non functional improvements in the LSF launcher

rswarbrick

I'm a big fan! This looks dramatically cleaner. Some small comments, but I'm really enthusiastic about how much more solid things seem to be getting.

src/dvsim/launcher/local.py

rswarbrick · 2025-10-30T15:44:42Z

src/dvsim/launcher/base.py

        context=[],
    )

+    def __init__(self, deploy: "Deploy") -> None:


Do we need the quotes here? I'd have thought that it only matters when we're type checking, in which case it's in scope?

I'll generally always go for the unquoted version by default. So if I've quoted them there will be a need.
There are generally two reasons for using the quoted version:

Circular references in imports A --> B --> A where one of those references if for typing only. So the dependency loop can be broken by using a typing.TYPE_CHECKING guard on the import and quoting the type.

Typing of class methods that take an object of the class itself. In this case the class can't be used in typing in the method args/return annotations as the interpreter evaluates those before the class itself. Not sure why it's that way, my guess is it's an optimisation thing and making it smarter would require a second pass or something. So in this case the class name would be quoted as well.

I agree with both of those things. My question was why the quotes are needed in this particular case.

But I think I've realised: even if the type annotation doesn't do anything at runtime, all of its symbols still need to be in scope.

Just checked, it's because the Deploy class module imports the Launcher for typing, creating a circular reference at module level at runtime. The interpreter just errors.

src/dvsim/sim_utils.py

src/dvsim/launcher/base.py

Sort the coverage database paths, while leaving the first path in place. This should mean that the dumped deployment objects are now deterministic. At the moment when comparing dumps often the only difference is the order of these paths, adding unnecessary noise and making it more difficult to check for meaningful differences. Signed-off-by: James McCorrie <[email protected]>

The pydantic model_dump method contains a lot more information, so rename to allow using these in parallel until all the Deploy classes are migrated to pydantic. Signed-off-by: James McCorrie <[email protected]>

Signed-off-by: James McCorrie <[email protected]>

Reduce dependency on sim_cfg by pulling out some of the data and putting it on a new WorkspaceCfg pydantic object. Signed-off-by: James McCorrie <[email protected]>

Add a `new()` static method to allow tests to test against the desired interface. The intention is to convert the deployment classes to Pydantic typed data classes. The deployment classes seem to be builders of fairly simple data classes once the construction process has been completed. The constructors take the data in other formats and then filter and transform to result in a new data class (deployment object). Pydantic data class constructors take in the final data, so we need a function to take the original data and do the transform. Having a `new()` static method is a fairly standard convention in languages such as Rust. For the moment this method just delegates to the existing constructor. The `CompileSim` unittests make use of this new method instead of the constructor directly. Which means they will be able to test the new code as well as the old code for A/B testing, without having to continually modify the tests. Signed-off-by: James McCorrie <[email protected]>

Migrate a dictionary to use sim_cfg names as keys rather than the config object itself, which assumes the object is hashable and creates unnecessary references. Signed-off-by: James McCorrie <[email protected]>

Signed-off-by: James McCorrie <[email protected]>

rswarbrick

Whoah! Please could you apply the changes requested by the review to the commits they were talking about? Don't forget that we don't use squash-merging.

Also, it would be really helpful if you could respond to a review comment (or add a thumbs up, or not bother saying anything specifically on the comment) BUT NOT MARK IT RESOLVED. Otherwise the poor reviewer has to manually wade through all the resolved comments and look to figure out whether the requested change had actually been made. It's rather inefficient.

With a big long list of comments on a PR of mine, my approach is usually:

Hack away in Emacs, using force push locally and interactive rebase to address the changes the reviewer suggested.
If they are talking nonsense, reply to the review comment in question. ("No! That's a terrible idea!" :-D)
(Ideally) run the tests again locally to check that I didn't just break compliation with a typo.
Force-push the new version.
Respond to the reviewer with a "I think your review is beautiful" (or something more honest)
Re-request review.

If the changes are small, the reviewer can then hit the "Compare" button next to my force-push to see what I changed. To make that work, it's helpful not to rebase onto master (because then the comparison picks up all of those changes). If I really want to rebase onto master, I'm sometimes really enthusiastic and do it in two stages:

Rebase onto master.
Force-push and leave a comment on the PR saying "No functional change: I've just rebased."
Make whatever tweak I was trying to make.
Force-push again, and leave a comment on the PR saying "I did something this time"

(Of course, I normally forget to do it in that order. You can totally do it the other way around too, and make the tweak first :-D)

machshev · 2025-10-30T21:01:41Z

Thanks @rswarbrick
Sure I'll try and remember to leave comments unresolved in the future. I've tended to use it as a checklist to remember which issues I've addressed. Closing the trivial changes (spelling changes and other recommendations just applied) and leaving issues where there is discussion.

Normally I'd fixup the commits where issues have been introduced. In this case though the first couple of commits were also in a separate PR with some urgent bug fixes (to fix the nightly runner which is now using the latest pypi python package - if this passes we can pin the weekly with known good). Unfortunately I'd rebased this PR on that one while debugging earlier in the day, in the mean time the other PR has been merged (nightly seems to be running okay so far), then you reviewed before I'd had chance to rebase on master. Hence requiring the extra commit.

Admittedly I was lazy and didn't check all the changes to see if they needed a new commit or not. I can take another look in the morning and see if I can tidy it up a bit.

machshev requested review from hcallahan-lowrisc and rswarbrick October 29, 2025 10:10

machshev force-pushed the refactor_scheduler branch 3 times, most recently from 484cbf3 to 551890f Compare October 30, 2025 15:23

rswarbrick reviewed Oct 30, 2025

View reviewed changes

machshev added 6 commits October 30, 2025 16:27

refactor: rename model_dump -> dump

6b56725

The pydantic model_dump method contains a lot more information, so rename to allow using these in parallel until all the Deploy classes are migrated to pydantic. Signed-off-by: James McCorrie <[email protected]>

style: linting, docstrings and typing

e659ebb

Signed-off-by: James McCorrie <[email protected]>

refactor: add WorkspaceCfg

0518563

Reduce dependency on sim_cfg by pulling out some of the data and putting it on a new WorkspaceCfg pydantic object. Signed-off-by: James McCorrie <[email protected]>

refactor: improvements in lsf launcher

46e11f4

Migrate a dictionary to use sim_cfg names as keys rather than the config object itself, which assumes the object is hashable and creates unnecessary references. Signed-off-by: James McCorrie <[email protected]>

machshev force-pushed the refactor_scheduler branch from 551890f to 46e11f4 Compare October 30, 2025 16:28

style: improved docstrings and linting fixes

d849391

Signed-off-by: James McCorrie <[email protected]>

machshev requested a review from rswarbrick October 30, 2025 17:47

rswarbrick reviewed Oct 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor scheduler and add tests for ComplieSim deployment #45

Refactor scheduler and add tests for ComplieSim deployment #45

Uh oh!

machshev commented Oct 29, 2025 •

edited

Loading

Uh oh!

rswarbrick left a comment

Uh oh!

Uh oh!

rswarbrick Oct 30, 2025

Uh oh!

machshev Oct 30, 2025

Uh oh!

rswarbrick Oct 30, 2025

Uh oh!

machshev Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rswarbrick left a comment

Uh oh!

machshev commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor scheduler and add tests for ComplieSim deployment #45

Are you sure you want to change the base?

Refactor scheduler and add tests for ComplieSim deployment #45

Uh oh!

Conversation

machshev commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rswarbrick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rswarbrick Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

machshev Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

rswarbrick Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

machshev Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rswarbrick left a comment

Choose a reason for hiding this comment

Uh oh!

machshev commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

machshev commented Oct 29, 2025 •

edited

Loading