-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Use optimised forms of class release functions where possible #9673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
In makeDtorCall we call a release function of the form void release(ObjectData *obj, Class *cl) but in some cases we load the second class pointer argument directly from the object pointer. However, we can actually invoke a more optimised version of the release function: void release2(ObjectData *obj) and let this release function do the work of extracting the class pointer. Doing so helps to reduce the total size of the Jit code cache by removing an associated load for every call. It may slightly increase the total instruction execution count by adding an extra level of indirection, but I think it's worth it.
|
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this in D87429159. (Because this pull request was imported automatically, there will not be any future comments.) |
|
|
||
| Optional<OptObjReleaseFunc> Class::optReleaseFunc() const { | ||
| if (m_releaseFunc == ObjectData::release) | ||
| return ObjectData::release2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unbraced multiline conditionals have always been a functional style prohibition. cf. "goto fail" for the reason why. prefer to either fit it on a single line or add braces. (i mention this bc i've noticed it in other PR's you've made recently, which i've seen out of idle curiosity.)
additionally, this doesn't need to be a member function of Class. m_releaseFunc is exposed as a const accessor by releaseFunc(), so you can and should just do this work at the one single callsite instead of polluting Class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks for letting me know! Happy to update that patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wow, do comments not all commit transactionally with a "review"? clearly i have never used github PRs before in my life (accurate).
mxw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's been over five years since i worked at this team or company, but surely nobody would object (ha ha ha) to some on-a-whim free labor, right?
it's not apparent to me that the additional call layer is the only potential cost; you may also be incurring an additional dependent load. loading the Class of an object is an lea (on x86 at least) which will likely be eliminated much of the time in larger function bodies if the Class is required elsewhere (which is common). pushing the load down into native code precludes that load elimination, and at no benefit to code size if the Class was already live in a register anyway.
ObjectData::release() requires that the cls argument be the Class of the obj argument. i suspect the reason your version of things was not done in the past is due to the cost of said dependent load. you can probably check this yourself by looking at the code output of some of the test cases.
Without this patch on AArch64 I see a load of the class pointer from the object (required for the second argument of the original release function) in the Jitted code. That means every instance we generate a call to this runtime function we are also generating an additional load. What this patch does is move that additional load to a single point (in the new release2 function), which reduces the overall size of the Jitted code. |
|
i was imagining that identical loads might be eliminated if the value is already known to be live in a register. but upon more careful recollection i don't think vasm has any load or memory tracking except for the frame pointer specifically. my memory is of course very obsolete, just tossing the idea out there. anyway all that said, object destruction has to actually free up memory so i don't see why an extra load would matter; maybe the code size reduction is worth it regardless. someone on the inside would have to notice this PR and perf test it i guess. (part of the reason i commented, idk who's doing that these days if anyone.) |
|
I seem to recall that one benefit of the current scheme was that if we knew the Class statically, we could supply it as a constant, saving a load all together. |
|
i think that case is handled separately just above where this diff makes its change. |
|
Thanks @david-arm for the PR. This passes CI tests internally. We will A/B perf test to see if there any significant changes. |
In makeDtorCall we call a release function of the form
void release(ObjectData *obj, Class *cl)
but in some cases we load the second class pointer argument directly from the object pointer. However, we can actually invoke a more optimised version of the release function:
void release2(ObjectData *obj)
and let this release function do the work of extracting the class pointer.
Doing so helps to reduce the total size of the Jit code cache by removing an associated load for every call. It may slightly increase the total instruction execution count by adding an extra level of indirection, but I think it's worth it.