Skip to content

Conversation

geofft
Copy link
Collaborator

@geofft geofft commented Aug 18, 2025

Mostly just want to see what the symbol validation failures are....

Mostly just want to see what the symbol validation failures are....
@indygreg
Copy link
Collaborator

The problem with this is the glibc version will be too new and binaries won't run an older Linux distros.

We currently use Debian 8 Jessie as our base image / glibc for x86-64 and Debian 9 for aarch64 and cross-compiled arches.

Debian 8 is running glibc 2.17, which is ancient. Debian 9 is running glibc 2.24. Per https://github.com/pypa/manylinux manylinux2014 is glibc 2.17. So upgrading to Debian 9 would drop manylinux2014 ABI compatibility and would make manylinux_2_24 the minimum supported platform tag.

IMO I think enough time has passed that we should switch to Debian 9.

(It would be awesome if uv could record metrics for the glibc version seen in the wild. This would really help make data based decisions about when it is safe to drop ABI compatibility.)

@geofft
Copy link
Collaborator Author

geofft commented Aug 19, 2025

Yea, I'm working on that, this is not for review yet ;)

@geofft geofft added the ci:skip label Aug 22, 2025
@geofft
Copy link
Collaborator Author

geofft commented Aug 22, 2025

Still not ready for review, but the last commit is the beginning of a tool to decrease the symbol version requirements where safe (a lot of symbols gained a new symbol version when glibc consolidated libdl, libm, etc. into libc). It is not yet wired into the build. With a manual (non-standalone) build of CPython from source, I am down to about 30 too-new symbols with that, of which ~half ought to be weakly linked (there's an example of how to do that with memfd_create) and ~half require staring at glibc to see exactly what they did with the ABI and how to manually bypass it (there's an example of how to do that with __libc_start_main).

@polarathene
Copy link

The problem with this is the glibc version will be too new and binaries won't run an older Linux distros.

Couldn't you build with zig cc?

That is the modern approach for targeting lower glibc versions these days, while allowing modern distro + toolchain instead of needing a much older environment as a workaround.

@geofft
Copy link
Collaborator Author

geofft commented Oct 9, 2025

I really should play with zig cc, thanks for the reminder, but my understanding is that the thing it does is equivalent to building with a sysroot of the older Debian version, which is a standard feature in the GNU and LLVM toolchains and wouldn't require adding a new tool. It's very nice that zig cc will download and set up the sysroot for you (and extremely nice that it handles non-Linux targets) but we could just do that ourselves. Honestly maybe we should just do that to get off the ancient Debian version.

But the thing this draft PR is getting at is targeting older glibc versions while adding support for newer features via e.g. weak linking. Both the zig cc and sysroot approaches when pointed at an older glibc will get you binaries that are equivalent to what we have now by just building in an old Docker container. I opened this to figure out what exactly those newer features are that we're missing out on, i.e., what symbols we aren't using because we're building on an older glibc that would start getting used when targeting a newer one.

@polarathene
Copy link

I really should play with zig cc, thanks for the reminder, but my understanding is that the thing it does is equivalent to building with a sysroot of the older Debian version, which is a standard feature in the GNU and LLVM toolchains and wouldn't require adding a new tool

You can target a lower glibc by appending the version to the target, which is what would be desired here? Nothing specific to Debian.

Should you want to cross-compile, you can sort out sysroots like you would with the other toolchains. The latest Zig can also do a static glibc build, but only with the native target (uses the hosts static glibc package IIRC), no cross-compilation or different glibc version support there (note: static glibc is rarely advisable, and definitely not for this project).

In Rust land, Zig also had the benefit for static builds with musl targets, especially for cross-compilation it was much simpler when building a project with some external dependencies like OpenSSL were to be built IIRC (Rust would need a toolchain setup for that target).


It's very nice that zig cc will download and set up the sysroot for you (and extremely nice that it handles non-Linux targets) but we could just do that ourselves. Honestly maybe we should just do that to get off the ancient Debian version.

I'm just bringing it up since like with Docker it does simplify things, and when it's possible to delegate through convenience of tooling I tend to prefer that provided it's a not a maintenance burden (I can't confidently say that Zig won't be, some projects do run into issues when building with Zig that can be friction to troubleshoot and get resolved upstream).


I opened this to figure out what exactly those newer features are that we're missing out on, i.e., what symbols we aren't using because we're building on an older glibc that would start getting used when targeting a newer one.

Not exactly my area of expertise, but while looking into this project I noticed the 2.17 glibc probably means this optimization for close_range() (requires glibc 2.34) would not be available I assume?

That one in particular is perhaps a rather niche concern, I ran into it when troubleshooting a major performance regression for Python software running in containers at the time 😓

EDIT: Actually it seems like you've run into it as well (last I recall this was partially resolved with Docker v25, but was still waiting on it's adoption of Containerd 2.x), I did extensive research into this issue and got the fixes pushed upstream (roughly it comes from bad config choices in the systemd service files distributed):

# apt iterates all available file descriptors up to rlim_max and calls
# fcntl(fd, F_SETFD, FD_CLOEXEC). This can result in millions of system calls
# (we've seen 1B in the wild) and cause operations to take seconds to minutes.
# Setting a fd limit mitigates.
#
# Attempts at enforcing the limit globally via /etc/security/limits.conf and
# /root/.bashrc were not successful. Possibly because container image builds
# don't perform a login or use a shell the way we expect.
RUN ulimit -n 10000 && apt-get update

On Github Actions runners, you won't have that problem (limit is 65K). Debian isn't affected as much either as a Docker build host since it patched a related systemd change (a workaround for another debian issue related to their patched PAM).


I would assume it'd be much simpler for you to leverage Zig to support customized builds via the Dockerfile when it comes to adjusting the glibc version?

I haven't verified, but it might help resolve concerns like this:

if [ "${CC}" = "musl-clang" ]; then
# In order to build the library with intrinsics, we need musl-clang to find
# headers that provide access to the intrinsics, as they are not provided by musl. These are
# part of the include files that are part of clang. But musl-clang eliminates them from the
# default include path. So copy them into place.
for h in ${TOOLS_PATH}/${TOOLCHAIN}/lib/clang/*/include/*intrin.h ${TOOLS_PATH}/${TOOLCHAIN}/lib/clang/*/include/{__wmmintrin_aes.h,__wmmintrin_pclmul.h,emmintrin.h,immintrin.h,mm_malloc.h,arm_neon.h,arm_neon_sve_bridge.h,arm_bf16.h,arm_fp16.h,arm_acle.h,arm_vector_types.h}; do

Depending on all the various packages you're building from source, you could potentially delegate quite a few of those too if that was acceptable. Quite a few have sys packages at crates.io, that'd allow you to delegate the builds through Cargo and help simplify maintenance of this project a bit better? (totally valid to prefer managing it all within the project if you need that too though_)

@geofft
Copy link
Collaborator Author

geofft commented Oct 9, 2025

You can target a lower glibc by appending the version to the target, which is what would be desired here? Nothing specific to Debian.

Sorry, I was a bit unclear - what I'm saying is that we can get this same benefit without Zig by setting up a sysroot manually, which isn't that hard to do (at least just for targeting an older glibc version—it's harder for non-Linux and zig cc adds a lot more value there).

Not exactly my area of expertise, but while looking into this project I noticed the 2.17 glibc probably means this optimization for close_range() (requires glibc 2.34) would not be available I assume?

I think that's right, yes. The goal here is for os.closerange() to use close_range(2) when available on the target system, without breaking functionality on glibc versions too old for it.

EDIT: Actually it seems like you've run into it as well

This one is in apt itself, i.e., an artifact of using an older version of apt as part of an older container.

Depending on all the various packages you're building from source, you could potentially delegate quite a few of those too if that was acceptable. Quite a few have sys packages at crates.io, that'd allow you to delegate the builds through Cargo and help simplify maintenance of this project a bit better? (totally valid to prefer managing it all within the project if you need that too though_)

That's an interesting question, I'll take a look!

Anyway again the purpose of this PR at the moment is just to see what the CI failures are if we were to target the newest glibc without doing anything fancy (because we have a validation test that things work on older versions of Debian). I pushed my commit for trying to do some weak linking stuff just so it wasn't solely on my laptop, but it's not ready yet, this PR is intentionally in draft without a useful description :)

@indygreg
Copy link
Collaborator

zig cc employs multiple solutions at different layers of the stack.

For linking, they distribute a metadata file annotating all the glibc versions and their symbols. Then at link time it generates a symbol map of sorts for the linker that instructs the linker which symbols are in a fake libc.so.6. The linker then links against a placeholder library and not a real glibc ELF. (I can't recall if they are using an assembly stub or other linker features - there are multiple ways to point the linker at something that isn't a SO.) This solution works for non-C/C++ code bases that only need to link against libc.so.6 (like Rust or Zig).

For C/C++ compilation where they need glibc C headers, I believe they distribute every distinct version of the header files from all glibc versions then create a symlink/hardlink tree that is added to the compiler include path. Essentially a dynamic sysroot.

Or at least this is how things worked when I looked at it a few years ago.

Fun fact: at one point I was going to author a Rust shim for LLVM/Clang that essentially implemented zig cc. But I was going to take it to the next level and support things like dynamically resolving headers and shared libraries from distro packages. Essentially the sysroot equivalent of virtualenvs. The basic premise of the idea was a combo of why isn't cross-compiling a solved problem in the year 202x and you shouldn't need a pre-built container image or sysroot archive to cross-compile. I got so far as implementing LLVM option parsing in Rust (see crates in https://github.com/indygreg/toolchain-tools) and pure Rust crates for Debian/RPM packaging (see crates in https://github.com/indygreg/linux-packaging-rs). I had all the scaffolding in place. But I stopped hacking on this for the reasons explained by https://gregoryszorc.com/blog/2024/03/17/my-shifting-open-source-priorities/. It's really too bad because the state of cross-compiling involving C/C++ is stuck 20+ years in the past and is forcing language ecosystems to solve cross-compiling problems that wouldn't exist if the C/C++ toolchain tooling had turnkey support for basic cross-compiling scenarios (like glibc version targeting).

Since Astral is now in the hosted package building space, maybe building this is something that interests you. Charlie knows how to reach me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants