-
Notifications
You must be signed in to change notification settings - Fork 61
WIP: Cache imports #2588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/import3
Are you sure you want to change the base?
WIP: Cache imports #2588
Conversation
| @@ -0,0 +1,80 @@ | |||
| {-# LANGUAGE BlockArguments #-} | |||
| module Swarm.Language.InternCache | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add a comment with a link to the source, https://chrispenner.ca/posts/intern-cache
| t -> throwError $ ImpureImport loc (prettyText t) | ||
|
|
||
| -- | Cache file contents for import location. | ||
| fileCache :: (MonadIO m) => InternCache m (ImportLoc Import.Resolved) Text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to cache file contents separately from caching processed modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, parsing the file could fail, so in that case we could reuse a little bit.
But mainly it was easier for me to write as a first test. 😁
| canonicalLoc <- resolveImportLoc (unresolveImportDir parent <//> loc) | ||
| add $ S.singleton canonicalLoc | ||
|
|
||
| sendIO (IC.lookupCached moduleCache canonicalLoc) >>= \case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big problem that I can see is that this will never consider reloading a file from disk. In other words, consider the following steps:
- Start Swarm.
- Load a scenario that imports
foo/bar.sw. - Edit
foo/bar.sw. - Reload the scenario.
You would like to get the new version of foo/bar.sw, of course, but you will get the old, cached version. To get the new version you would have to completely restart Swarm.
One solution to this I have considered is to add extra data to import locations that allow us to uniquely identify them in time as well as space: a "last modified" timestamp in the case of local files, and a content hash in the case of remote files.
However, there also needs to be a mechanism to identify when we are willing to update. For example, if we are loading a scenario and encounter import https://foo seven times while recursively loading imports, we should only fetch it over the network the first time --- even if (especially if!) the remote file changes while we are still in the process of loading. However, if we then restart the scenario, we have to fetch it over the network again to see whether it changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking that "last modified time" would need to be tracked as well, but I was not sure how exactly.
Here is my proposal:
- add the timestamp info only to the cache, so the import location data stay the same
- when loading file use
getModificationTimeand compare it with the time in cache- if they match, reuse the value in cache
- if not, overwrite the value in cache - the previous value will never be useful again
I have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not given network files much thought, do they have a timestamp we could compare?
Personally, I would not mind if they required a Swarm restart to update.
Oh, good point! I like this, it's much simpler. It seems reasonable that files fetched over the network are much less likely to change (and if they do, I might not even be expecting it). Hence it makes sense to never invalidate them from the cache.
As for files on disk, I like your proposal. I still think to be 100% correct we need to account for the fact that while, say, recursively loading all the code for a given scenario, we do not want to update a cached module even if it changes on disk in the middle of loading --- because all the imports of a given module should be consistent across a given scenario. However, I think we can make an issue for this / put it off until later. It won't happen very frequently.
No description provided.