-
Notifications
You must be signed in to change notification settings - Fork 275
Reduce multiplication depth in _gr_poly_compose_axnc #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5906852 to
9156a6e
Compare
|
This problem shows up in a few places. Note that there is a In the Brent-Kung power series composition, I used this criterion for whether to favor reduced multiplication depth when computing successive powers:
You could do the same thing here, and we should probably do something similar by default in About the particular algorithm used: your way to reduce the depth to |
looks useful. We could use that as a subroutine here, the caveat is
that's the intention, indeed. Do you think it's worth it?
actually this solution doesn't have that benefit either. Indeed I haven't thought about it. if the goal is to use as many squares as possible, it's also possible to use only Something like the actual code may want to use non-recursive implementation for performance. non-recursive implementation sanity check that it's correct |
9156a6e to
099d6a9
Compare
d1557e4 to
00ddb74
Compare
|
if this were C++ I could do anyway, I believe it's fixed. I have two possible algorithms for the
which one do you prefer? That said, maybe memory allocation is cheap enough that it would be faster to allocate a new buffer, |
src/gr_generic/generic.c
Outdated
| if (i % 2 == 0) | ||
| status |= sqr(GR_ENTRY(res, i, sz), GR_ENTRY(res, i / 2, sz), ctx); | ||
| else | ||
| status |= mul(GR_ENTRY(res, i, sz), GR_ENTRY(res, i - 1, sz), x, ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be mul(GR_ENTRY(res, i, sz), GR_ENTRY(res, (i + 1) / 2, sz), GR_ENTRY(res, i / 2, sz), ctx);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually res[i] = res[i-1] * res has the advantage that, even in a ball field, if the initial number is exact and has precision much less than prec, multiplying by that could be slightly faster than multiplying by a general number.
But if the goal is to reduce multiplication depth, res[i] = res[i/2] * res[(i+1)/2] is better. Quick testing shows res[i] = res[i/2] * res[(i+1)/2] resulting in less error too.
The
In this code, we want to compute$a, a^2, a^3,…, a^{\texttt{len1}}$ .$a^i = a^{i-1}⋅ a$ . Now, we use $a^i = a^{2^j} ⋅ a^{i-2^j}$ , where $j$ is $O(\log \texttt{len1})$ .
Previously, it computes
flint_ctz(i).Both ways,
len1multiplications are performed. However, the latter makes the tree depth onlyThe advantage is that if we have an
arb_poly, then the accumulated error would be approximately linear in the computation tree depth. This change reduces the final error.The disadvantage of this method includes
What do you think?
Attached benchmark below. The code is rather ugly, but I'm pretty sure it measures the right thing.
c.c