Neighbourhood sampling order during texture filtering

During my tinkering with pixel-art scaling I found a source of noise which comes up when trying to use the derivative operations in shaders. It’s somewhat my own fault for doing thresholding excessively, but still…

When applying a window function to a shader (here I’ll use linear interpolation for simplicity) it might be reasonable to do something like this:

ivec2 uv_i = ivec2(floor(uv));
vec2 weight = uv - vec2(uv_i);

vec4 a = texelFetch(s, uv_i + ivec2(0,0)), b = texelFetch(s, uv_i + ivec2(1,0)),
     c = texelFetch(s, uv_i + ivec2(0,1)), d = texelFetch(s, uv_i + ivec2(1,1));
return mix(mix(a, b, weight.x), mix(c, d, weight.x), weight.y);

What I found is that for some uses of those variables it can be problematic if they’re not smooth functions along the axes of interpolation. Something that especially stands out when the source data is low-resolution pixel art.

Imagine you’re sampling along a line spanning from pixels M to R in the example below. Ignore the Y dimension for now. You’ll see a, b, and weight take step changes every time an integer boundary is crossed:

That’s usually OK, unless you do anything that involves derivatives or, I guess, maybe if you accidentally leave mipmaps switched on.

Here’s an workaround I came up with:

ivec2 uv_i = ivec4((ivec2(floor(uv)) + 1) & -2, ivec2(floor(uv)) | 1);
vec2 weight = abs(uv - vec2(uv_i.xy));

vec4 a = texelFetch(s, uv_i.xy), b = texelFetch(s, uv_i.zy),
     c = texelFetch(s, uv_i.xw), d = texelFetch(s, uv_i.zw);
return mix(mix(a, b, weight.x), mix(c, d, weight.x), weight.y);

This rearranges the offset coordinates so that a always gets a pixel from an even column and even row index, d always gets a pixel from an odd row and odd column, etc..

Consequently, variables change like so:

While a and b do still take step changes, they do so when their corresponding weights are zero. Depending on the situation this may take care of the problem already, or it may be necessary to rearrange a bit more of the arithmetic to ensure the multiplication by the zero-weighting happens earlier to force the switch to appear smooth.

This can be extended to a mod-n system for kernels of size n. Capturing pixels into something more like a ring buffer, where only the edge cases (still the zero-weighted cases) get updated during a transition.

The ring-buffer analogy is misleading, of course, because the adjacent pixels are computed concurrently and they all fill up their own private copies of the buffer at the same time without sharing context, so there isn’t the bandwidth saving of a classical ring buffer. But the real point is that they all have mostly the same values at the same offests and so this mitigates a class of glitches in the derivatives.

TODO: give the example code for larger kernels, and a simplified example shader exhibiting the problem.