Full display triangle optimization
So right here’s a widely known optimization trick that additionally tells you a bit about how GPUs work.
Full display put up processing results are normally drawn as a pair of two massive triangles that cowl the entire display. A fraction shader then processes each pixel. However you may make it a tiny bit sooner by drawing a single giant triangle as an alternative:
Establishing the triangle is handy to do instantly in a vertex shader with no vertex buffer certain. Name glDrawArrays(GL_TRIANGLES, 0, 3)
with this vertex shader:
void main()
{
vec2 coords[] = {
vec2(-1.0f, -1.0f),
vec2(3.0f, -1.0f),
vec2(-1.0f, 3.0f),
};
gl_Position = vec4(coords[gl_VertexID], 0.0f, 1.f);
}
There are three reasons why the single triangle approach is faster.
- The GPU executes fragment shaders in lockstep 2×2 pixels at a time in order to support automatic mip level selection. This ends up making your shader run twice on some pixels where the two triangles meet. This is mandated by the Direct3D and OpenGL specifications.
- In actual hardware shading is done 32 or 64 pixels at a time, not four. The problem above just got worse.
- Covering the full screen with a single triangle can make your shader’s memory access patterns linear. See the “case study” section in this AMD performance tuning guide for an instance of this and the second level.
In my microbenchmark the one triangle method was 0.2% sooner than two. We’re undoubtedly deep into micro-optimization territory right here 🙂 I suppose should you learn textures in your shader you’ll see bigger positive factors like occurred within the AMD case examine linked above.
For extra particulars on the mechanics of the automated mip choice, see the part “{Hardware} Partial Derivatives” in this 2021 article by John Hable.
In fact you might use compute shaders right here however you should make sure the memory access patterns play well with no matter format your textures (and the framebuffer?) are saved in. So it’s not an apparent win should you take code complexity under consideration.
Lastly, should you nonetheless find yourself drawing two triangles, be sure to’re drawing a 4 vertex triangle strip and never a six vertex record. I’m not claiming there’s a velocity distinction – that is just for the nerd cred!
For extra tips like this normally, see Bill Bilodeau’s 2014 GDC talk on the subject (slides).
Because of mankeli for suggestions on a draft of put up.