fea: Codex library integration #1

2-towns · 2025-10-28T05:57:00Z

marcinczenko

I am very pleased you added it to that repo - it was so much easier for me to review. Thanks!

I was a bit quick with the comments, but I think I covered a couple of important - from my perspective at least - items. The most important is lack of cancellation support in streaming and keeping Manifest from the Codex lib hidden behind CodexClient facade and instead do a full copy from CodexLib.Manifest => CodexManifest (from codex_manifest.go which you removed) in the CodexClient. It will not only be cleaner and less noisy this way (the tests will require less changes), but will make any kind re-basing much easier.

.github/workflows/ci.yml

.vscode/settings.json

marcinczenko · 2025-10-28T19:53:47Z

communities/codex_archive_downloader.go

 		return fmt.Errorf("failed to trigger archive download with CID %s: %w", cid, err)
 	}

-	if manifest.CID != cid {


The manifest type provided by Codex lib, should not be used outside of codex_client (at least this is how I envisioned it and this is a good principle in general). Most of the changes in tests are related to this type and it should not be. The reason we have CodexClient is to make sure that we hide the library specifics from the rest of the code. The things above CodexClient should know as little as possible about what CodexClient is using - the only thing we should allow to "stick out" is the construction - I could do a little bit better job here as well (e.g. a factory that will take care for correct initialization), but I decided to leave it for later - it is small change.
So, to sum up this: CodexClient (and CodexClientInterface) should not expose any types from CodexLib - inside, we should keep using its own Manifest type (the one from codex_manifest.go that has been removed) and basically copy the manifest returned by CodexLib to that manifest.

I disagree on this one.

1- It is nice to expose the CodexLib because it could be used in other places in the code (for waku specific things maybe, or file sharing feature..).
2- The configuration codex.Conf from the CodexLib is exposed in the status-go integration. If you want to keep the abstraction inside CodexClient you would need to expose the configuration inside it as well which is cumbersome. Anytime a change is applied in the CodexLib you would need to reflect it in the CodexClient as well.
3- To integrate CodexLib into status-go, the build process has been updated (by adding CGO variables and fetch command). So CodexClient will make explicit the requirement of CodexLib, it is not something hidden by the CodexClient that no one will see.

cc @emizzle

The types introduced by Codex lib can be used it in other parts, but as an external dependency they should not pollute the whole code base. It is a classical layered architecture, and leaking abstractions between layers is a generally speaking a design mistake: each layer should communicate through its own abstractions or well-defined interfaces. I agree it is not always feasible (yet often it is possible) to hide everything and introducing a new library will have impact - but this impact should be controlled and minimal. In this case, leaking construction/configuration artifacts is kind of ok, although - yes - we should also hide it. The constructor to CodexClient could take separate args for each param (especially if they are not many of them) and should not leak it. Or, if there are too many arguments to pass, we call some sort of a CodexClientFactory interface function, e.g. Create() (with no args) which instead of returning CodexClient, it returns CodexClientInterface. This way we remove the dependency on the CodexClient completely, which should all us running unit tests without depending on CodexClient. But because this leak has low impact at the moment it is easier for me to accept it. For the Manifest it is different, if you decide to remove the Codex Lib and use something else instead you are propagating lots of easy to avoid changes. We may decide to accept it, but this is easy to avoid tight-coupling to an external library (Codex), which in this case is very easy to avoid. Unless we think that Codex lib is something that should be considered a first-class citizen in status-go and you have no intention to ever considering removing it.

The way you put it does not convince me, so maybe indeed some extra rational from @emizzle or @gmega would help me to feel better about it and that this is something we want to do...

This is PoC, so I am not going to fight over it, but I perceive it as a basic design flaw....

btw we can do this if you want :

type CodexManifest = codex.Manifest

Discussed with Arnaud - we will use:

type CodexManifest = codex.Manifest

in CodexClient temporarily, but I do not feel good with it. I would prefer to shield CodexLib better - but because we have to focus now on testing it end-to-end, we try this first and let's see how this works. Ideally, we should be able to run all unit tests without depending on the CodexLib so that the tests can be run without extra CodexLib specific compile flags. If we leak it too much, potentially lots of status-go tests will depend on the CodexLib. This does not feel right and if this causes problems with running tests, I am afraid there will be no other option but to do this right - but we will see if the patterns I know transfer to go easily.

I generally agree that CodexLib should be not be leaked outside of an abstraction as much as possible. However, I believe that abstraction should not be limited to CodexClient, and instead should be an abstraction that handles interactions with the Community History archives and indices.

For example, all of the functions in manager_archive_file.go should be agnostic to torrents or codex datasets. Functions inside this module should make calls to an abstraction that deals with eg uploading/downloading index files and archives.

For example, there should only be one version of createHistoryArchive. Inside this function, there would be calls to a CommunityArchiver (eg) interface that uploads and gets archive index metadata bytes.

There would be two types that implement the CommunityArchiver interface: TorrentArchiver and CodexArchiver.

CodexArchiver would import the CodexLib package and it would generally not leak outside that module as much as possible.

In this way, there can be other modules inside status-go, eg file-sharing in chats, that also make calls to the CodexArchiver interface, and do no leak any CodexLib-specific components.

I would also like to point out that I don't think this type of abstraction is something we need to tackle immediately. Our main focus is getting local integration tests completed so we can have some validation that the integration functions as expected, in anticipation for the DST tests. We should look at this later on, when there is time, however.

communities/codex_archive_downloader_integration_test.go

Makefile

README.md

Add codex library integration

a74ac84

2-towns force-pushed the feat/go-library-integration branch from 8d1ec27 to a74ac84 Compare October 28, 2025 06:45

2-towns added 3 commits October 28, 2025 07:49

Try to fix ci

f8b673c

Increase test timeout

a97d2ea

Update ci

3d00c81

2-towns marked this pull request as ready for review October 28, 2025 07:14

marcinczenko requested changes Oct 28, 2025

View reviewed changes

2-towns and others added 4 commits October 29, 2025 05:51

Apply PR reviews

100ca35

tidy go.mod

95d5bf8

Use CodexManifest and CodexConfig types instead of exposing codex.

722d67e

Update codex version to support cancellation

b3bd9a1

fea: Codex library integration #1

Are you sure you want to change the base?

fea: Codex library integration #1

Conversation

2-towns commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcinczenko left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcinczenko Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2-towns Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcinczenko Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2-towns Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

marcinczenko Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emizzle Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

emizzle Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2-towns commented Oct 28, 2025 •

edited

Loading

marcinczenko left a comment •

edited

Loading

marcinczenko Oct 28, 2025 •

edited

Loading

2-towns Oct 29, 2025 •

edited

Loading

marcinczenko Oct 29, 2025 •

edited

Loading

marcinczenko Oct 29, 2025 •

edited

Loading

emizzle Oct 30, 2025 •

edited

Loading