Improving the versions.json rebuild process

The way that versions.json is built currently is that we take the list of tags on the Julia repository, a list of known platforms, and the file types we expect to exist for each platform, and try to download each version, platform, file type combination. If the combination is downloadable, we record some information about it, otherwise we skip it. At the end of the build, we unconditionally upload the resulting file into "production."

There are a few limitations with this implementation:

1. That process is very slow, so the overall build takes a considerable amount of time: my most recent runs as of this writing took 3.5 hours each, though earlier runs had usually been more like 1.5 hours (which is still a long time). It will continue to take more time for each new version that's added.
2. It ignores the fact that platform support varies across versions. For example, we don't need to look for `aarch64-apple-darwin` until 1.7, or `x86_64-unknown-freebsd` until 0.6. While it's possible for someone to retroactively build binaries for a platform, that's historically been quite rare in practice.
3. There is no opportunity to review the result prior to deployment. The 1.11.7 issue (see #49) would have been caught in review of versions.json before going live and wreaking havoc, but instead wasn't caught until the JuliaUp versiondb was built.

I think all three of these limitations are solvable simultaneously, but it would require significant changes both to the implementation and how one interacts with this repository.

To start, note the description of the current implementation that exists in the code:
https://github.com/JuliaLang/VersionsJSONUtil.jl/blob/aee1ba3b5b3fb6e3fd8ce400b4369238f9a6e8ed/src/VersionsJSONUtil.jl#L84-L86

> We don't have a nice, neat list of what is or is not available

We didn't at the time but we do now: it's versions.json! We could use the latest versions.json as a starting point when building a new one. We would only need to know what's changed.

The way I propose we go about this is as follows:

- Create a human-friendly representation of the core data from versions.json (versions, platforms, files) that's checked into this repository. I know everybody hates YAML, but for the sake of example, say this is like `expected-versions.yaml`.
- When a new version is created, part of the release process becomes submitting a PR to this repository to make the corresponding changes to `expected-versions.yaml`, including the version, its platforms, and the files associated with each platform for that version. Making new expected platform binaries for a given version or set of versions would go through the same process.
- A new versions.json is built in the PR, using only the changes made in that PR, and a diff against the currently live versions.json is displayed. (It could be shown in an auto-posted comment, for example—any way of getting insight into the effects of the change should be fine.)
- When the PR is merged, the versions.json built as part of the PR is deployed. This would require enabling the repository option that requires PR branches to be up to date with the base branch before merging.

This setup would also provide a concrete set of expectations that can be checked via CI prior to deployment to ensure correctness in the result.

There is one additional implementation detail that I think we could take advantage of, and that's that our binaries are hosted on AWS S3. That exposes object metadata that can be queried to see whether changes were made outside of this workflow that need to be reflected in versions.json. For example, we could compare ETags (equal to the MD5 hash for files not uploaded in parts), or even just the last modified date, and only redo checksumming if something has changed. We could also use the checksum files that are built as part of the release process rather than needing to download the file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving the versions.json rebuild process #51

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# We're going to collect the combinatorial explosion of version/os-arch possible downloads.
	# We don't have a nice, neat list of what is or is not available, and so we're just going to
	# try and download each file, and if it exists, yay. Otherwise, bleh.

Improving the versions.json rebuild process #51

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions