Skip to content

Conversation

@xsebek
Copy link
Member

@xsebek xsebek commented Oct 10, 2025

No description provided.

@@ -0,0 +1,80 @@
{-# LANGUAGE BlockArguments #-}
module Swarm.Language.InternCache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment with a link to the source, https://chrispenner.ca/posts/intern-cache

t -> throwError $ ImpureImport loc (prettyText t)

-- | Cache file contents for import location.
fileCache :: (MonadIO m) => InternCache m (ImportLoc Import.Resolved) Text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to cache file contents separately from caching processed modules?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, parsing the file could fail, so in that case we could reuse a little bit.

But mainly it was easier for me to write as a first test. 😁

canonicalLoc <- resolveImportLoc (unresolveImportDir parent <//> loc)
add $ S.singleton canonicalLoc

sendIO (IC.lookupCached moduleCache canonicalLoc) >>= \case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big problem that I can see is that this will never consider reloading a file from disk. In other words, consider the following steps:

  1. Start Swarm.
  2. Load a scenario that imports foo/bar.sw.
  3. Edit foo/bar.sw.
  4. Reload the scenario.

You would like to get the new version of foo/bar.sw, of course, but you will get the old, cached version. To get the new version you would have to completely restart Swarm.

One solution to this I have considered is to add extra data to import locations that allow us to uniquely identify them in time as well as space: a "last modified" timestamp in the case of local files, and a content hash in the case of remote files.

However, there also needs to be a mechanism to identify when we are willing to update. For example, if we are loading a scenario and encounter import https://foo seven times while recursively loading imports, we should only fetch it over the network the first time --- even if (especially if!) the remote file changes while we are still in the process of loading. However, if we then restart the scenario, we have to fetch it over the network again to see whether it changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking that "last modified time" would need to be tracked as well, but I was not sure how exactly.

Here is my proposal:

  • add the timestamp info only to the cache, so the import location data stay the same
  • when loading file use getModificationTime and compare it with the time in cache
    • if they match, reuse the value in cache
    • if not, overwrite the value in cache - the previous value will never be useful again

I have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.

Copy link
Member

@byorgey byorgey Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.

Oh, good point! I like this, it's much simpler. It seems reasonable that files fetched over the network are much less likely to change (and if they do, I might not even be expecting it). Hence it makes sense to never invalidate them from the cache.

As for files on disk, I like your proposal. I still think to be 100% correct we need to account for the fact that while, say, recursively loading all the code for a given scenario, we do not want to update a cached module even if it changes on disk in the middle of loading --- because all the imports of a given module should be consistent across a given scenario. However, I think we can make an issue for this / put it off until later. It won't happen very frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants