Some speedup with SSE 4.1 by jpcima · Pull Request #340 · sfztools/sfizz

jpcima · 2020-08-01T10:13:19Z

This speeds up sample_quality=2 by 15 to 20%, using SSE4.1 dot-product primitive, and avoiding a bit of instruction latency.
Just for illustrating, this optimization should be made CPU-dispatched.
Possibly strings can benefit from a similar optimization.

paulfd · 2020-08-02T21:05:55Z

Nice one! With the runtime dispatch I think we can target interesting optimization like this. Do you want that I benchmark it on Intel/AMD? I was thinking on working on ARM in the holidays, if I have some time 🙂

paulfd · 2020-08-02T21:07:37Z

btw the meanSquared SIMD helper is also a dot product.

paulfd · 2020-12-26T13:43:23Z

Considering the simde version you proposed, is this speedup obsolete? We could maybe have a runtime dispatcher.

jpcima · 2020-12-26T14:24:43Z

Considering the simde version you proposed, is this speedup obsolete? We could maybe have a runtime dispatcher.

It's by no means obsolete but it would be desirable to have the cpu dispatcher.

From experimenting with the strings effect, I discovered that one can extract great speed benefits from loop unrolling, and more so when coupled with some inlining. (some greater than 4x on SSE, which might be explained by latency effects of memory or individual instructions)
I'd like the same to be experimented with the resampler; but the simde PR should be dealt with first.

paulfd · 2020-12-26T17:33:21Z

Sure, I think the simde PR is fine now. Dec 26, 2020 15:25:18 JP Cimalando <notifications@github.com>:

…

> Considering the simde version you proposed, is this speedup obsolete? We could maybe have a runtime dispatcher. > It's by no means obsolete but it would be desirable to have the cpu dispatcher. From experimenting with the strings effect, I discovered that one can extract great speed benefits from loop unrolling, and more so when coupled with some inlining. (some greater than 4x on SSE, which might be explained by latency effects of memory or individual instructions) I'd like the same to be experimented with the resampler; but the simde PR should be dealt with first. — You are receiving this because you commented. Reply to this email directly, view it on GitHub[#340 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ADUFWQMKMWKBL4X47EF4XWDSWXW3NANCNFSM4PRTWWQQ]. [data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAABHNCSVQICAgIfAhkiAAAAB9JREFUaIHtwQEBAAAAgiD/r25IQAEAAAAAAAAAAC8GJDAAAY7rwGcAAAAASUVORK5CYII=###24x24:true###][Tracking image][https://github.com/notifications/beacon/ADUFWQOQSJHTJIJDULYG4KLSWXW3NA5CNFSM4PRTWWQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFTENZIQ.gif]

jpcima force-pushed the sse-opt branch 3 times, most recently from 437892d to be7dad3 Compare August 1, 2020 11:02

Some speedup with SSE 4.1

bd4a10a

jpcima force-pushed the sse-opt branch from be7dad3 to bd4a10a Compare August 1, 2020 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some speedup with SSE 4.1#340

Some speedup with SSE 4.1#340
jpcima wants to merge 1 commit intosfztools:developfrom
jpcima:sse-opt

jpcima commented Aug 1, 2020

Uh oh!

paulfd commented Aug 2, 2020

Uh oh!

paulfd commented Aug 2, 2020

Uh oh!

paulfd commented Dec 26, 2020

Uh oh!

jpcima commented Dec 26, 2020

Uh oh!

paulfd commented Dec 26, 2020 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jpcima commented Aug 1, 2020

Uh oh!

paulfd commented Aug 2, 2020

Uh oh!

paulfd commented Aug 2, 2020

Uh oh!

paulfd commented Dec 26, 2020

Uh oh!

jpcima commented Dec 26, 2020

Uh oh!

paulfd commented Dec 26, 2020 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants