Fragment Shader Thread Group: Quads
Quads
In fragment shader, the primitive rasterized is arragned into a thread group before excution, which is called quads. Since quads is threads, the fragement not covered by the primitive can also be excuted. However, these fragements are masked out after excution, which means these are discard
ed without updating the framebuffer. OpenGL provides the built-in value, gl_HelperInvocation
, to check if the running fragment is masked out or not. This value is true if the fragment is masked out and is false otherwise. Besides, most GPU excutes a batch of threads such as a warp in NVIDIA GPU which is composed of threads. In this case, maximum quads are run at the same time effectively as shown in the above figure.
This logic is useful when some glsl functions like dFdx
or dFdy
are excuted. The current fragment can peek at the registers of the adjacent fragment thread and do subraction to calculate partial derivatives. A caveat is that dFdx
or dFdy
in a conditional branch may provide unexpected result because the active thread in a quad will peek at the register of the other inactive thread followed by the old value reference.
Meanwhile, dFdx
or dFdy
are technically based on numerical differentiation. How can the fragment in the boundary of a quad be handled if forward or backward differencing is used? Both differencing are applied for these functions. When it comes to dFdx
, the left side fragment should use forward differencing while the right side one should do backward differencing to avoid having to access the next quad, which means that their results are the same.
Reference
[2] https://stackoverflow.com/questions/16365385/explanation-of-dfdx
[3] https://stackoverflow.com/questions/39579150/difference-between-dfdxfine-and-dfdxcoarse