Skip to content

Conversation

@gizahNL
Copy link
Contributor

@gizahNL gizahNL commented Nov 24, 2021

Low hanging fruit: change glWaitSync behavior
Optimize fragment shader to discard when invisible (alpha < 0.01)
Change shader (less branching due to non-uniform flow control)
Using a separate shader program as a "fast path" & using openGL blend func (executed on GPU ROP, less texture reads) shaves off another ~5%

Total improvement from ~80-85% to ~50-55% utilization when running 4 HD 50i channels each with 2 layers, close to 2.1 GPU utilization.

Gijs Peskens added 4 commits November 24, 2021 15:01
instead of repeatedly calling glClientWaitSync with a 1
nanosecond timeout, call it with a 20ms timeout w flush.

Decreases average GPU utilisation on my testbench by about
10% (~85%->~75%, 4 * 1080i5000 on k620)
@gizahNL gizahNL changed the title [WIP] opengl linux performance opengl performance improvements Nov 25, 2021
When running it do blending via OpenGL, this is a tad bit faster.
Gijs Peskens added 4 commits November 29, 2021 15:41
Since we keep filling the command buffer there is no need
to flush and we can safely forego it.
This marginally improves performance.
@Julusian
Copy link
Member

Trying this with 4x 1080i50 channels (each playing 2 AMB) on ubuntu 22.04 with a GTX1060, I am seeing gpu usage go from 40-45% to 38-42%, which is not a significant improvement. What gpu and os are you using?

On windows it gets stuck in an error loop when playing any media with caspar::gl::ogl_invalid_framebuffer_operation_ext

Change shader (less branching due to non-uniform flow control)

It has been quite a while (~10 years) since I have had to think about optimising cuda code, but from what I remember branching is only an issue when threads in the same cluster make take different routes. So for us, different branches being used for each frame being composited should have no major impact?

What is the cost of frequently switching shaders? some layers on a channel could be on the fast and some on the slow shader

As it currently stands, I am not convinced that this will give a noticeable performance benefit to most users, so I am not convinced it is worth the extra complexity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants