Skip to content

Conversation

@glittershark
Copy link
Member

@glittershark glittershark commented Nov 1, 2025

Motivation

This is a draft PR to replace the null-terminated C-style strings in Value with pascal strings, where the length is stored in the allocation before the data. This should be considered basically an experiment right now - this ended up being a big optimization for tvix, but it's unclear how much of a win it'd be for cppnix

Context

Depends on #14444
Depends on #14470

@github-actions github-actions bot added new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority c api Nix as a C library with a stable interface labels Nov 1, 2025
@Ericson2314 Ericson2314 force-pushed the pascal-strings branch 2 times, most recently from dda5c2f to 15a9f93 Compare November 1, 2025 19:03
@Ericson2314
Copy link
Member

CC @Radvendii

@Ericson2314 Ericson2314 changed the title Pascal strings Use hybrid C / Pascal strings in the evaluator Nov 1, 2025
@Ericson2314
Copy link
Member

OK I am taking a look at the sanitzer failure.

@xokdvium
Copy link
Contributor

xokdvium commented Nov 1, 2025

Would to see this through :)

@Ericson2314

This comment was marked as resolved.

@edolstra
Copy link
Member

edolstra commented Nov 3, 2025

What sort of performance improvement can we expect from this? (Memory, speed?)

@glittershark
Copy link
Member Author

What sort of performance improvement can we expect from this? (Memory, speed?)

I think it's hard to say. For tvix the big performance win was saving a word of cache footprint by converting rust fat pointers to thin pointers. For C++nix the value representation already uses only one word for the pointer (and another word for context iirc) so the win is less clear. Plausibly length gets cheaper (I have definitely worked in systems where strlen is really costly!) but ime length doesn't end up being that hot in Nix evaluations.

On the other hand, tvix had to incur the cost of an extra pointer indirection to find string length, so maybe the delta in this case is more clearly beneficial, since c++nix already has to chase a pointer to look up the length of a string.

As is always the case, we should benchmark.

@Ericson2314
Copy link
Member

Ericson2314 commented Nov 3, 2025

In the meeting (going on right now) we also talked about how parsers going through strings character by character were needlessly quadratic. This might not be in our test suite today, but @tomberek can easily add such an example.

I would expect that since this dramatically includes that use-case, even if it is doesn't improve other use-cases (but also doesn't make them worse) the PR should overall be worth it.

In short, the plan to me is:

  1. Benchmark
  2. If seems worth it, merge
  3. Otherwise, add parser test case
  4. Benchmark again
  5. Almost certainly will be worth it (unless we made a mistake in the implementation)
  6. Merge

So all paths lead to merging, but we do make sure we have evidence first.

@Radvendii

This comment was marked as resolved.

@Radvendii

This comment was marked as resolved.

@glittershark
Copy link
Member Author

I would really not be surprised if this issue is just UB - the original implementation was pretty messy, and I never got around to combing through it to find all the ways I was holding C++ wrong.

@dpulls
Copy link

dpulls bot commented Nov 3, 2025

🎉 All dependencies have been resolved !

@Radvendii
Copy link
Contributor

This is what I've found:

  1. the problem has to do with invalid context strings.
  2. it seems to be getting set to another string, rather than turning into a nullpointer or something.
  3. that context string is not getting set to the invalid state when it's initialized via mkPath
  4. the problem goes away when the GC is disabled

@Radvendii
Copy link
Contributor

Oh! Could it be that boehm is confused because we're holding onto pointers but they don't point at the beginning of the allocation?

So then it concludes that there's no reference to it and frees it?

hmm... presumably boehm is smarter than that...

@Radvendii

This comment was marked as resolved.

@Ericson2314
Copy link
Member

Applied the fix, and factored out a new prep PR of #14470 to keep this smaller.

@glittershark
Copy link
Member Author

boehm delenda est

@xokdvium
Copy link
Contributor

xokdvium commented Nov 4, 2025

So then it concludes that there's no reference to it and frees it?

Well, it's not suprising consdering that we have interior pointer detection only for the first 8 bytes and only on 64 bit systems to make that work with low-bit tagging.

@Ericson2314 Ericson2314 marked this pull request as ready for review November 4, 2025 17:04
@Ericson2314 Ericson2314 requested a review from edolstra as a code owner November 4, 2025 17:04
@Radvendii
Copy link
Contributor

boehm delenda est

This might be my next project.

to make that work with low-bit tagging.

I was wondering how that worked!

@dpulls
Copy link

dpulls bot commented Nov 9, 2025

🎉 All dependencies have been resolved !

@xokdvium xokdvium force-pushed the pascal-strings branch 2 times, most recently from 0667013 to 3bc870e Compare November 10, 2025 00:49
Copy link
Contributor

@xokdvium xokdvium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked slightly to reduce the diff and addressed review comments from @Radvendii. Also got rid of a interior pointer footgun that I could find. I'm happy with this diff

return getStorage<FunctionApplicationThunk>();
}

const char * pathStr() const noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xokdvium seems like there should be some method getting a const StringData & from the path that this and the other method are written in terms of?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringData is more of implementation detail though. No code actually cares about it beyond the string_view and c_str accessors. Maybe it will start caring, but now it doesn't so there's no immediate need to expose this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I guess that is fine.

FWIW I would like try to get rid of the trailing null byte at some point. and then the const char * ones really ought to go away.

Comment on lines +200 to +201
size_type size_;
char data_[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you want these public?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not too clear, but if we want to be on the safe side the layout compatible types should also have the same visibility of members.

Copy link
Member

@Ericson2314 Ericson2314 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh we are being extra safe on that, but less safe on preventing people from going through fields by mistake.

https://stackoverflow.com/questions/36149462/does-public-and-private-have-any-influence-on-the-memory-layout-of-an-object

reading that, it seems like we are probably fine so long as all the fields are the same visibility.

Ericson2314 and others added 2 commits November 10, 2025 00:54
Forgot to print in one case

Co-authored-by: Aspen Smith <[email protected]>
Replace the null-terminated C-style strings in Value with hybrid C /
Pascal strings, where the length is stored in the allocation before the
data, and there is still a null byte at the end for the sake of C
interopt.

Co-Authored-By: Taeer Bar-Yam <[email protected]>
Co-Authored-By: Sergei Zimmerman <[email protected]>
@Ericson2314 Ericson2314 added this pull request to the merge queue Nov 10, 2025
Merged via the queue into NixOS:master with commit a786c9e Nov 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c api Nix as a C library with a stable interface new-cli Relating to the "nix" command with-tests Issues related to testing. PRs with tests have some priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants