-
Notifications
You must be signed in to change notification settings - Fork 0
fea: Codex library integration #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
8d1ec27 to
a74ac84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very pleased you added it to that repo - it was so much easier for me to review. Thanks!
I was a bit quick with the comments, but I think I covered a couple of important - from my perspective at least - items. The most important is lack of cancellation support in streaming and keeping Manifest from the Codex lib hidden behind CodexClient facade and instead do a full copy from CodexLib.Manifest => CodexManifest (from codex_manifest.go which you removed) in the CodexClient. It will not only be cleaner and less noisy this way (the tests will require less changes), but will make any kind re-basing much easier.
| return fmt.Errorf("failed to trigger archive download with CID %s: %w", cid, err) | ||
| } | ||
|
|
||
| if manifest.CID != cid { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifest type provided by Codex lib, should not be used outside of codex_client (at least this is how I envisioned it and this is a good principle in general). Most of the changes in tests are related to this type and it should not be. The reason we have CodexClient is to make sure that we hide the library specifics from the rest of the code. The things above CodexClient should know as little as possible about what CodexClient is using - the only thing we should allow to "stick out" is the construction - I could do a little bit better job here as well (e.g. a factory that will take care for correct initialization), but I decided to leave it for later - it is small change.
So, to sum up this: CodexClient (and CodexClientInterface) should not expose any types from CodexLib - inside, we should keep using its own Manifest type (the one from codex_manifest.go that has been removed) and basically copy the manifest returned by CodexLib to that manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree on this one.
1- It is nice to expose the CodexLib because it could be used in other places in the code (for waku specific things maybe, or file sharing feature..).
2- The configuration codex.Conf from the CodexLib is exposed in the status-go integration. If you want to keep the abstraction inside CodexClient you would need to expose the configuration inside it as well which is cumbersome. Anytime a change is applied in the CodexLib you would need to reflect it in the CodexClient as well.
3- To integrate CodexLib into status-go, the build process has been updated (by adding CGO variables and fetch command). So CodexClient will make explicit the requirement of CodexLib, it is not something hidden by the CodexClient that no one will see.
cc @emizzle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types introduced by Codex lib can be used it in other parts, but as an external dependency they should not pollute the whole code base. It is a classical layered architecture, and leaking abstractions between layers is a generally speaking a design mistake: each layer should communicate through its own abstractions or well-defined interfaces. I agree it is not always feasible (yet often it is possible) to hide everything and introducing a new library will have impact - but this impact should be controlled and minimal. In this case, leaking construction/configuration artifacts is kind of ok, although - yes - we should also hide it. The constructor to CodexClient could take separate args for each param (especially if they are not many of them) and should not leak it. Or, if there are too many arguments to pass, we call some sort of a CodexClientFactory interface function, e.g. Create() (with no args) which instead of returning CodexClient, it returns CodexClientInterface. This way we remove the dependency on the CodexClient completely, which should all us running unit tests without depending on CodexClient. But because this leak has low impact at the moment it is easier for me to accept it. For the Manifest it is different, if you decide to remove the Codex Lib and use something else instead you are propagating lots of easy to avoid changes. We may decide to accept it, but this is easy to avoid tight-coupling to an external library (Codex), which in this case is very easy to avoid. Unless we think that Codex lib is something that should be considered a first-class citizen in status-go and you have no intention to ever considering removing it.
The way you put it does not convince me, so maybe indeed some extra rational from @emizzle or @gmega would help me to feel better about it and that this is something we want to do...
This is PoC, so I am not going to fight over it, but I perceive it as a basic design flaw....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw we can do this if you want :
type CodexManifest = codex.ManifestThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with Arnaud - we will use:
type CodexManifest = codex.Manifestin CodexClient temporarily, but I do not feel good with it. I would prefer to shield CodexLib better - but because we have to focus now on testing it end-to-end, we try this first and let's see how this works. Ideally, we should be able to run all unit tests without depending on the CodexLib so that the tests can be run without extra CodexLib specific compile flags. If we leak it too much, potentially lots of status-go tests will depend on the CodexLib. This does not feel right and if this causes problems with running tests, I am afraid there will be no other option but to do this right - but we will see if the patterns I know transfer to go easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree that CodexLib should be not be leaked outside of an abstraction as much as possible. However, I believe that abstraction should not be limited to CodexClient, and instead should be an abstraction that handles interactions with the Community History archives and indices.
For example, all of the functions in manager_archive_file.go should be agnostic to torrents or codex datasets. Functions inside this module should make calls to an abstraction that deals with eg uploading/downloading index files and archives.
For example, there should only be one version of createHistoryArchive. Inside this function, there would be calls to a CommunityArchiver (eg) interface that uploads and gets archive index metadata bytes.
There would be two types that implement the CommunityArchiver interface: TorrentArchiver and CodexArchiver.
CodexArchiver would import the CodexLib package and it would generally not leak outside that module as much as possible.
In this way, there can be other modules inside status-go, eg file-sharing in chats, that also make calls to the CodexArchiver interface, and do no leak any CodexLib-specific components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like to point out that I don't think this type of abstraction is something we need to tackle immediately. Our main focus is getting local integration tests completed so we can have some validation that the integration functions as expected, in anticipation for the DST tests. We should look at this later on, when there is time, however.
@marcinczenko