pema.dev

Weird HLSL tricks Unity doesn't want you to know about

Intro

A while back, I wrote a compiler for a shading language that targets a stack based virtual machine which is, itself, written in an (Unity-flavored) HLSL shader. The project was aptly named Shaderception, and was what I consider to be a true display of esotericism - a trait I value highly. While writing it, I learned a few nifty and/or evil tricks, which I’ve been meaning to write down for some time. Let’s get to it.

Dynamically assigning to arrays

If you have ever written HLSL code that makes use arrays, you may have run into issues when trying to assigning values to non-constant indices in the array. For example, the following snippet causes a compilation error:

uint _Idx;
float _MyArray[100];

void bar() {
    // ERROR: array reference cannot be used as an l-value;
    // not natively addressable
    _MyArray[_Idx] = 20;
}

The default qualifier for a variable in global scope in HLSL is uniform, and it turns out code like this is actually legal, just only on non-uniform arrays. Thus, you can prevent the error by adding the static qualifier:

uint _Idx;
static float _MyArray[100];

void bar() {
    _MyArray[_Idx] = 20; // No error :D
}

This does, however, mean that you can no longer assign the array externally - it exists only within the shader code itself. This can still be very useful, though. I use such arrays to store variables and values on the stack in the aforementioned virtual machine:

static float4 stack[128];
static uint stackPtr = 0;
static float4 vtab[256];

It is possible to index in this way into uniform arrays, but doing so requires a redundant loop:

uint _Idx;
float _MyArray[100];

void bar() {
    for (int i = 0; i < 100; i++) {
        // No errors here :P
        if (i == _Idx) _MyArray[_Idx] = 20;
    }
}

Dynamically assigning to vectors and matrices

You can assign to elements of a vector or matrix dynamically too, but sadly, it requires a redundant loop again. Here is an example:

uint _Idx;

void bar() {
    float4 myVec = 0;
    float4x4 myMat = 0;
    for (uint i = 0; i < 4; i++) {
        if (_Idx == i) {
            myVec[_Idx] = 1337;
            myMat[_Idx] = myVec;
        }
    }
}

NaNs are weird

The shading language I wrote supports scalars as well as vectors of different length, just as HLSL. When writing the virtual machine, I had the horrible idea of encoding the size of vector directly in each value by using NaNs to indicate unused channels, and store everying as a float4. For example, a scalar would be encoded as float4(value, NaN, NaN, NaN) and 3D vectors would be float4(value1, value2, value3, NaN) etc. This allowed me to handle typechecking and implicit casting directly in the shader, but at the cost of several painful hours of debugging seemingly nonsensical output. It turns out the HLSL really does not care about your feeling whens NaNs are involved. Here are some key points:

isnan() is a lie

HLSL has an intrinsic for detecting NaN values, isnan(x). Unfortunately, in Unity’s setup, it will lie to you. For example, isnan(sqrt(-1)) evaluates to false. This is because the compiler aggresively optimizes away NaN checks. A workaround is XOR the input with an unbound uniform (which in Unity will be initialized to 0 by default), forcing the compiler to be unable to optimize. Here is a function that you can use instead of isnan:

uint _Stupid;
float isActuallyNan(float val) {
    return isnan(asfloat(_Stupid ^ asuint(val)));
}

Producing NaNs

One must be careful when intentionally producing NaN values. Methods such as sqrt(-1) or (0.0/0.0) do work, but the compiler may erroneously compile out the NaN while doing optimization, such as constant folding. Unbound uniforms to the rescue again - you can use sqrt((-1)^_Stupid). Additionally, those 2 aforementioned methods will generate an annoying warning, so I actually use asfloat(-1) instead, which causes no warning.

Dynamic indexing causes NaN propagation

When accessing an element of a vector or matrix using a dynamic index, if the right hand side of the assignment contains a NaN, the NaN will spread to every element! Consider the following snippet

uint _Idx;

bool foo() {
    float4 oneNan = float4(1, sqrt(-1), 1, 1);
    return isActuallyNan(oneNan[_Idx]);
}

This will return true, even when _Idx is explicitly set to 0. The issue occurs in this expression oneNan[_Idx] which will evaluate to a vector containing only NaN! And the exact same happens for matrices as well.

[forcecase] is great

Writing a virtual machine usually involves some large switch statements. Not too many people seem to know that switch statements, as with other control flow, can be annotated with compiler directives dictating how they are compiled. For switch statements, they are listed on MSDN. The default in most cases will be a series of if-statements, corresponding to the [branch] directive. In almost all cases, though, [forcecase] will yield much better performance, especially when writing cursed abuse of shader code like I was doing for this project. As far as I understand, this basically forces the generation of a jump table. The only caveat is that the compiler will occasionally refuse to compile code using [forcecase]. :(

Constant buffer aliasing

Unity has a limitation on the length of uniform arrays, 1023 to be precise. This limitation comes from limitations of constant buffers, which are used implicitly. In HLSL, a constant buffer can hold 4096 vectors, each with four 32 bit values. Since arrays can also contain matrices, each of which contain 4 vectors, the max amount of 4x4 matrices in a single constant buffer is 1024. The 4x4 matrix is HLSL’s largest primitive type, and is what Unity’s limitation is based on. I’m not sure why the limit is 1023 and not 1024, though. You can work around this limitation using a trick I shall refer to as “cbuffer aliasing”. Here is some code demonstrating it:

cbuffer MyCBuffer {
    float4 _MyArray[1023*4] : packoffset(c0);  
    float4 _MyArray0[1023] : packoffset(c0);
    float4 _MyArray1[1023] : packoffset(c1023);
    float4 _MyArray2[1023] : packoffset(c2046);
    float4 _MyArray3[1023] : packoffset(c3069);
};

By adding 4 unused arrays and manually specifying their packing offsets in the constant buffer such that they overlap with a single, larger array, we can effectively get a much larger array, one that actually reaches the limit of data in a single constant buffer. With this, we can use _MyArray in our code however we want. Though one more thing needs to be done for this to work. If we don’t ever use _Program0 through _Program3, the compiler will optimize them away. Crucially, the compiler must believe that they contribute something to the final output of the shader. My usual trick is therefore to use them in a calculation that I know will never will be hit:

// Hack to prevent Unity from deleting aliased cbuffer
if (uv.x < 0) output = _Program0[0] + _Program1[0] + _Program2[0] + _Program3[0];

...

return output;

Here, uv is input UV coordinates to the fragment shader.

The end

That was all for now. You can find more tricks at my shader-knowledge repo.