Fragment Shader Thread Group: Quads

Quads

Quads

In a fragment shader, the primitive rasterized is arranged into a thread group before execution, which is called quads. Since quads are 2 ×\times 2 threads, the fragement not covered by the primitive can also be executed. This kind of fragment is called helper pixel or sometimes quad overshading. However, these fragements are masked out after execution, which means these are discarded without updating the framebuffer. OpenGL provides the built-in value, gl_HelperInvocation, to check if the running fragment is masked out or not. This value is true if the fragment is masked out and is false otherwise. Besides, most GPU executes a batch of threads such as a warp in NVIDIA GPU which is composed of 32 threads. In this case, maximum 8 quads can be run at the same time effectively as shown in the above figure.

This architecture is useful when some glsl functions like dFdx or dFdy are executed. The current fragment can peek at the registers of the adjacent fragment thread and do subtraction to calculate partial derivatives. A caveat is that dFdx or dFdy in a conditional branch may provide unexpected result because the active thread in a quad will peek at the register of the other inactive thread followed by the old value reference.

Meanwhile, dFdx or dFdy are technically based on numerical differentiation. How can the fragment in the boundary of a quad be handled if forward or backward differencing is used? In general, derivative computations are hidden from the user. There is no exact specification of how derivatives should be computed. Some common methods are using both differencing are applied for these derivatives functions. When it comes to dFdxFine, the left side fragment should use forward differencing while the right side one should do backward differencing to avoid having to access the next quad, which means that their results are the same. On the other hand, dFdxCoarse is computed as the top right fargment minus top left fragment and this value is used for all 4 fragments. In fact, dFdx returns either dFdxCoarse or dFdxFine and dFdy works the same. They may be chosen based upon factors such as performance or the API value, GL_FRAGMENT_SHADER_DERIVATIVE_HINT.

References

[1] Tomas Akenine-Mller, Eric Haines, and Naty Hoffman. 2018. Real-Time Rendering, Fourth Edition (4th. ed.). A. K. Peters, Ltd., USA.

[2] GPU-DRIVEN RENDERING

[3] Explanation of dFdx

[4] Difference between dFdxFine and dFdxCoarse


© 2025. All rights reserved.