TerrainArray / tront.xyz

Sidestepping Unity Shader Graph's 16-sampler limit by stuffing four layer IDs and four blend weights into one RGBA float per pixel. Live WebGL2 demo, full walkthrough, plus an archive of the 2015 Stack Overflow post the whole technique is built on.

The 16-sampler wall

The classic Unity-style splat-map approach assigns one terrain layer per channel of an RGBA texture: R dirt, G grass, B rock, A sand. The fragment shader reads the splat map and uses each channel as the blend weight for that layer's texture.

It works; it also caps at four layers per splat map. Want eight? Stack two splat maps and four more texture samples (now you've burned through nine sampler slots just for the ground). The URP TerrainLit shader-graph template caps out at sixteen texture samplers, full stop. Real games want twenty or fifty distinct ground materials; the math doesn't work.

The way out has two pieces: a different kind of texture binding, plus a slightly evil little encoding trick to make that binding pay off.

Trick 1: sampler2DArray

A texture array is a stack of identically-sized images bound as a single GPU resource. You sample it with a 3-component coordinate: uv.xy picks a pixel like normal, a third integer z picks which layer. The whole stack costs one sampler slot, regardless of whether it holds 8 layers or 256.

That alone fixes the sampler accounting; the splat map is still RGBA though, so it still only carries four blend weights per pixel. You can read from 256 layers but you can only blend 4 of them per pixel unless you find more room in the control map.

Trick 2: bit-packing the control map

A 32-bit float has 32 bits; a 16-bit integer needs 16. So one float can hold two 16-bit values if you bypass the float's numeric meaning and treat it as a bag of bits: write the bits with integer ops, ship them through a FloatType texture, read them back as integers on the other side.

That gives you eight 16-bit slots per RGBA pixel. The demo packs them like this:

Crucial property: the four IDs in the B/A channels can be any values in the array, so different pixels can mix completely different sets of four textures. Across the whole terrain you might touch all 32 (or 256, or however many you packed); a single pixel only ever blends four. That's the same compromise AAA terrain shaders run under, and you don't notice because in practice no pixel ever blends more than a couple of materials anyway.

The math (HLSL vs GLSL)

The 2015 Stack Overflow post that this technique traces back to (mirrored at the bottom of this page) phrases it in HLSL. The WebGL2 / Three.js demo uses the GLSL 3.0 equivalent. Same math, different keyword spellings:

HLSL / Adam Miles, SO 2015// pack
float a = 0.45;
float b = 0.55;
uint  aS = a * 0xFFFF;
uint  bS = b * 0xFFFF;
uint  packed = (aS << 16) | (bS & 0xFFFF);
float f = asfloat(packed);

// unpack
uint  u = asuint(f);
float a2 = (u >> 16) / 65535.0;
float b2 = (u & 0xFFFF) / 65535.0;

GLSL 3.0 / this demo// pack (JS, into Float32 texture)
function packIntsToFloat(a, b){
  _u32[0] = (a << 16) | (b & 0xFFFF);
  return _f32[0];
}

// unpack (fragment shader)
void unpackFloat(float p, out float a, out float b){
  uint u = floatBitsToUint(p);
  a = float(u >> 16u) / 65535.0;
  b = float(u & 0xFFFFu) / 65535.0;
}

The HLSL keywords asuint / asfloat are bit-reinterprets: they take 32 bits and re-read them as the other type without changing a single bit. GLSL 3.0 names the same operation floatBitsToUint / uintBitsToFloat. JavaScript doesn't have such operators directly, so the demo does it with a 4-byte ArrayBuffer aliased by both a Uint32Array and a Float32Array (write through one view, read back through the other).

For pure integer pairs (the layer IDs, where you don't want any rescaling) the JS function above is enough. For 0–1 float pairs (the blend weights), one extra step quantizes each to a 16-bit integer first:

The CPU-side pipeline

The CPU work happens once when the splat map is generated (and again whenever the user re-rolls it). Four steps:

1. Fill N textures into a DataArrayTexture

Allocate one big Uint8Array sized texRes × texRes × 4 × N, then write each layer's RGBA pixels into its slice. Three.js's DataArrayTexture takes that buffer directly. In the demo the layers are generated procedurally with canvas patterns; in a real game they'd be baked PBR textures loaded from disk.

2. Paint splats into a wide raw-weights array

Before you can extract a top-4 you need to know the weight of every layer at every pixel. Allocate a temporary array sized W × H × N floats (yes it's big: 512 × 512 × 32 = ~32 MB at Float32) and add weight to it splat by splat. The demo uses smoothstep falloff so each brush has a soft edge:

3. Top-4 extraction per pixel

This is the compression. For every pixel, list which layers have any weight at all, sort by weight descending, take the four heaviest, normalize them to sum to 1. The other twenty-eight layers' contributions at that pixel are discarded. That's the same compromise Unity's terrain pipeline makes (and as noted above, you don't notice).

4. Pack into a 4-channel Float32 texture

Pack the four normalized weights into R/G and the four IDs into B/A, using the helpers from the previous section:

Wrap in a DataTexture with RGBAFormat + FloatType, set filtering to NearestFilter (see the gotcha below), upload. That's the entire CPU side.

The fragment shader, end-to-end

Per pixel, the shader does six things: sample the control map, unpack the four weights, unpack the four IDs, sample the texture array four times, multiply each sample by its weight, sum. That's it.

The four texture() calls all sample the same physical binding (the texture array), just at different z indices. Total sampler-slot cost: 2 (one for the control map, one for the array). Same fragment shader scales from 4 layers in the array to 4,000 with zero change.

The gotcha: bilinear filtering destroys bit-packed data

Trap: you cannot let the GPU linearly interpolate a bit-packed texture. If it blends a pixel containing layer ID 5 with a neighbouring pixel containing layer ID 27, the bits in between are not a valid packed pair; they decode to garbage like ID 16 and ID 3, neither of which were actually painted there.

Nearest-neighbour sampling snaps to the nearest control-map pixel and reads its bits intact. The catch: you lose smooth blending across control-map pixel boundaries, so material transitions look blocky if the control map is low-resolution.

The honest fix (used in shipping AAA terrain shaders): the fragment shader manually fetches the four nearest control-map texels itself, unpacks each one independently, samples the texture array for all of them, and does the bilinear blend in math at the very end. More work in the shader, but the visual smoothness comes back. The demo on this page doesn't bother because the control map is 512² over a 200-unit plane; the pixels are small enough that you'd have to zoom in hard to see the seams.

Live demo

32 procedurally-generated textures, 200 random paint splats, top-4 extraction per pixel, packed into a 512² RGBA / Float32 control map. One sampler binding holds all 32 textures; one binding holds the splat map. The fragment shader does the same six operations as the snippet above.

Drag to orbit. Hover the right-hand palette to isolate a layer, click to lock isolation. Toggle Show layer indices to swap every texture for a giant index number (useful for tracing which IDs the bit-pack landed on).

Credits

Thanks to Duck for hitting the 16-sampler wall in public, to Nukadelic for explaining how he side-stepped the same wall on Quest VR, and to Adam Miles for the 2015 Stack Overflow answer (mirrored below) that the whole thing is built on. Three.js / WebGL2 port and page by Tront.

Stack Overflow: Pack two floats within range into one float

The post Nukadelic linked is the ancestor of this whole technique. Mirrored verbatim below: title, both answers (Adam Miles' accepted high-vote one and SmugLispWeenie's counterpoint), the comment thread, scores, bylines. Stripped of nav clutter and the unrelated sidebar.

mirrored content / CC BY-SA

Pack two floats within range into one float

Asked Oct 2 2015 / Viewed 5k+ times / Tags: encoding hlsl decoding

QUESTION +3

In HLSL, how would I go about packing two floats within the range of 0–1 into one float with an optimal precision. This would be incredibly useful to compress my GBuffer further.

Miguel P, Oct 2 2015 at 20:55

ANSWER +10 (accepted)

// Packing
float a = 0.45;
float b = 0.55;
uint aScaled = a * 0xFFFF;
uint bScaled = b * 0xFFFF;
uint abPacked = (aScaled << 16) | (bScaled & 0xFFFF);
float finalFloat = asfloat(abPacked);

// Unpacking
float inputFloat = finalFloat;
uint  uintInput = asuint(inputFloat);
float aUnpacked = (uintInput >> 16) / 65535.0f;
float bUnpacked = (uintInput & 0xFFFF) / 65535.0f;

Adam Miles, Oct 3 2015 at 11:06

Jongwareover a year ago
Nice. The "optimal precision" part is clearly bogus, since the number of available bits gets halved. So whatever can be stored in what's left can be considered 'optimal'.

Adam Milesover a year ago
Yeah, I took it to mean "don't waste any bits" rather than "do the impossible"!

Richard Klassenover a year ago
Thank you!!! How does this answer only have 3 upvotes!?

ANSWER +1

Converting floating point numbers to fixed point integers is an error prone idea, due to floats covering much larger magnitudes. Say unpacking sRGB will give you pow(255, 2.2) values, which are larger than 0xffff, and you will need several times that amount for robust HDR. Generally fixed point code is very fragile, obfuscated and a nightmare to debug. People invented floats for a good reason.

There are several 16-bit float formats. IEEE 16-bit float one is optimized for numbers between -1.0 to 1.0, but also support numbers up to 0x10000, just in case you need HDR, still so you will need to normalize your larger floats for it. Then there is bfloat16, which behaves like normal 32-bit float, just with less precision. IEEE 16-bit floats are widely supported by modern CPUs and GPUs, and can also be converted quickly even in software. bfloat16 is just gaining popularity, so you will have to research if it is suitable for your needs. Finally you can introduce your own 16-bit float format, using integer log function, which is provided by most CPUs as a single instruction.

SmugLispWeenie, Aug 20 2019 at 20:46

Mirrored from SO #32915724 under CC BY-SA 4.0. Content (C) respective authors. Mirrored as a precaution because the original is the load-bearing reference for the technique on this page, and the network is mortal.

Bit-packed splat maps: 32+ terrain textures, one sampler.

The 16-sampler wall

Trick 1: `sampler2DArray`

Trick 2: bit-packing the control map

The math (HLSL vs GLSL)

The CPU-side pipeline

1. Fill N textures into a `DataArrayTexture`

2. Paint splats into a wide raw-weights array

3. Top-4 extraction per pixel

4. Pack into a 4-channel `Float32` texture

The fragment shader, end-to-end

The gotcha: bilinear filtering destroys bit-packed data

Live demo

Credits

Stack Overflow: Pack two floats within range into one float

Pack two floats within range into one float

Further reading

Bit-packed splat maps: 32+ terrain textures, one sampler.

The 16-sampler wall

Trick 1: sampler2DArray

Trick 2: bit-packing the control map

The math (HLSL vs GLSL)

The CPU-side pipeline

1. Fill N textures into a DataArrayTexture

2. Paint splats into a wide raw-weights array

3. Top-4 extraction per pixel

4. Pack into a 4-channel Float32 texture

The fragment shader, end-to-end

The gotcha: bilinear filtering destroys bit-packed data

Live demo

Credits

Stack Overflow: Pack two floats within range into one float

Pack two floats within range into one float

Further reading

Trick 1: `sampler2DArray`

1. Fill N textures into a `DataArrayTexture`

4. Pack into a 4-channel `Float32` texture