-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Use hybrid C / Pascal strings in the evaluator #14442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dda5c2f to
15a9f93
Compare
|
CC @Radvendii |
15a9f93 to
6f82634
Compare
|
OK I am taking a look at the sanitzer failure. |
|
Would to see this through :) |
d5a1e45 to
f89ef40
Compare
91905f3 to
2f15843
Compare
This comment was marked as resolved.
This comment was marked as resolved.
2f15843 to
8e7cd91
Compare
|
What sort of performance improvement can we expect from this? (Memory, speed?) |
I think it's hard to say. For tvix the big performance win was saving a word of cache footprint by converting rust fat pointers to thin pointers. For C++nix the value representation already uses only one word for the pointer (and another word for context iirc) so the win is less clear. Plausibly length gets cheaper (I have definitely worked in systems where strlen is really costly!) but ime length doesn't end up being that hot in Nix evaluations. On the other hand, tvix had to incur the cost of an extra pointer indirection to find string length, so maybe the delta in this case is more clearly beneficial, since c++nix already has to chase a pointer to look up the length of a string. As is always the case, we should benchmark. |
|
In the meeting (going on right now) we also talked about how parsers going through strings character by character were needlessly quadratic. This might not be in our test suite today, but @tomberek can easily add such an example. I would expect that since this dramatically includes that use-case, even if it is doesn't improve other use-cases (but also doesn't make them worse) the PR should overall be worth it. In short, the plan to me is:
So all paths lead to merging, but we do make sure we have evidence first. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
I would really not be surprised if this issue is just UB - the original implementation was pretty messy, and I never got around to combing through it to find all the ways I was holding C++ wrong. |
|
🎉 All dependencies have been resolved ! |
|
This is what I've found:
|
|
Oh! Could it be that boehm is confused because we're holding onto pointers but they don't point at the beginning of the allocation? So then it concludes that there's no reference to it and frees it? hmm... presumably boehm is smarter than that... |
8e7cd91 to
ea6ba08
Compare
This comment was marked as resolved.
This comment was marked as resolved.
ea6ba08 to
874640c
Compare
|
Applied the fix, and factored out a new prep PR of #14470 to keep this smaller. |
|
boehm delenda est |
Well, it's not suprising consdering that we have interior pointer detection only for the first 8 bytes and only on 64 bit systems to make that work with low-bit tagging. |
This might be my next project.
I was wondering how that worked! |
|
🎉 All dependencies have been resolved ! |
874640c to
7a642f2
Compare
0667013 to
3bc870e
Compare
xokdvium
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked slightly to reduce the diff and addressed review comments from @Radvendii. Also got rid of a interior pointer footgun that I could find. I'm happy with this diff
| return getStorage<FunctionApplicationThunk>(); | ||
| } | ||
|
|
||
| const char * pathStr() const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xokdvium seems like there should be some method getting a const StringData & from the path that this and the other method are written in terms of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StringData is more of implementation detail though. No code actually cares about it beyond the string_view and c_str accessors. Maybe it will start caring, but now it doesn't so there's no immediate need to expose this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I guess that is fine.
FWIW I would like try to get rid of the trailing null byte at some point. and then the const char * ones really ought to go away.
| size_type size_; | ||
| char data_[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you want these public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not too clear, but if we want to be on the safe side the layout compatible types should also have the same visibility of members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh we are being extra safe on that, but less safe on preventing people from going through fields by mistake.
reading that, it seems like we are probably fine so long as all the fields are the same visibility.
Forgot to print in one case Co-authored-by: Aspen Smith <[email protected]>
Replace the null-terminated C-style strings in Value with hybrid C / Pascal strings, where the length is stored in the allocation before the data, and there is still a null byte at the end for the sake of C interopt. Co-Authored-By: Taeer Bar-Yam <[email protected]> Co-Authored-By: Sergei Zimmerman <[email protected]>
3bc870e to
3bf8c76
Compare
Motivation
This is a draft PR to replace the null-terminated C-style strings in Value with pascal strings, where the length is stored in the allocation before the data. This should be considered basically an experiment right now - this ended up being a big optimization for tvix, but it's unclear how much of a win it'd be for cppnix
Context
Depends on #14444
Depends on #14470