Skip to content

Conversation

@satur9nine
Copy link
Contributor

@satur9nine satur9nine commented Nov 27, 2025

Part 3 of 3

Optimize Prepend: handle alignment and byte writes in one pass

This is a break-up of #8766

Improves performance of the PythonTest over part 2 by 5.4%

python-perf-prep.txt
python-perf-prepend.txt

encode.Write(packer.voffset, self.Bytes, self.Head(), x)
new_head = self.head - N.VOffsetTFlags.bytewidth
self.head = UOffsetTFlags.py_type(new_head)
encode.Write(packer.voffset, self.Bytes, new_head, x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is self.head = UOffsetTFlags.py_type(new_head) coming from in these changes? because i would expect self.Head() == self.head and Write itself did not change 😕

Copy link
Contributor Author

@satur9nine satur9nine Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about you question, new_head is a local variable being used which should be faster than accessing fields self.head or function calls selfHead().

Or maybe you are asking why I added the call to UOffsetTFlags.py_type, to be honest I was shuffling a lot of code around and I saw that in existing code pretty much every time head is assigned it is having this function call on it, like in WriteVtable:

self.head = UOffsetTFlags.py_type(objectStart)

I should have looked closer and I see now that py_type is just int. So really all the XYZ.py_type calls in this code are basically converting to int but everything is already an integer so we don't need this call. I updated #8808 to avoid more casting and the improvements are even better and the code changes are cleaner too.

def Prepend(self, flags, off):
self.Prep(flags.bytewidth, 0)
self.Place(off, flags)
size = flags.bytewidth
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this primary faster because it skips self.Pad(alignSize) ? I guess this is a value judgment if we want more performance or simpler code here - im fine with both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, calling Pad is what is making this original implementation expensive, inlining the implementations and avoiding Pad helps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I improved #8808 I see this change really gives a very small bump for all that inlining, feel free to take it or drop it, the existing code is definitely more readable and most of the gains come from part 1 and 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this is such a small improvement I'm tempted to drop it -- especially with no commentary :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, most improvements come from the previous commits!

@fliiiix
Copy link
Contributor

fliiiix commented Nov 28, 2025

Neat i left a few comments but overall i think this can be merged 👍

@satur9nine
Copy link
Contributor Author

Not significant improvement, #8807 and #8808 suffice.

@satur9nine satur9nine closed this Dec 2, 2025
@satur9nine satur9nine deleted the python-perf-prepend branch December 2, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants