tront.xyz / terrainarray
Sidestepping Unity Shader Graph's 16-sampler limit by stuffing four layer IDs and four blend weights into one RGBA float per pixel. Live WebGL2 demo, full walkthrough, plus an archive of the 2015 Stack Overflow post the whole technique is built on.
[ live demo ] [ source ] [ credits ]
The classic Unity-style splat-map approach assigns one terrain layer per channel of an RGBA texture: R dirt, G grass, B rock, A sand. The fragment shader reads the splat map and uses each channel as the blend weight for that layer's texture.
It works; it also caps at four layers per splat map. Want eight? Stack two splat maps and four more texture samples (now you've burned through nine sampler slots just for the ground). The URP TerrainLit shader-graph template caps out at sixteen texture samplers, full stop. Real games want twenty or fifty distinct ground materials; the math doesn't work.
The way out has two pieces: a different kind of texture binding, plus a slightly evil little encoding trick to make that binding pay off.
sampler2DArrayA texture array is a stack of identically-sized images bound as a single GPU resource. You sample it with a 3-component coordinate: uv.xy picks a pixel like normal, a third integer z picks which layer. The whole stack costs one sampler slot, regardless of whether it holds 8 layers or 256.
GLSL 3.0 / fragmentuniform sampler2DArray terrainTextures; // 32 textures, one binding
vec3 grass = texture(terrainTextures, vec3(uv, 5.0)).rgb;
vec3 rock = texture(terrainTextures, vec3(uv, 11.0)).rgb;
That alone fixes the sampler accounting; the splat map is still RGBA though, so it still only carries four blend weights per pixel. You can read from 256 layers but you can only blend 4 of them per pixel unless you find more room in the control map.
A 32-bit float has 32 bits; a 16-bit integer needs 16. So one float can hold two 16-bit values if you bypass the float's numeric meaning and treat it as a bag of bits: write the bits with integer ops, ship them through a FloatType texture, read them back as integers on the other side.
That gives you eight 16-bit slots per RGBA pixel. The demo packs them like this:
R = weight 0 (upper 16 bits) + weight 1 (lower 16 bits) G = weight 2 (upper 16 bits) + weight 3 (lower 16 bits) B = layer ID 0 (upper 16 bits) + layer ID 1 (lower 16 bits) A = layer ID 2 (upper 16 bits) + layer ID 3 (lower 16 bits)
Crucial property: the four IDs in the B/A channels can be any values in the array, so different pixels can mix completely different sets of four textures. Across the whole terrain you might touch all 32 (or 256, or however many you packed); a single pixel only ever blends four. That's the same compromise AAA terrain shaders run under, and you don't notice because in practice no pixel ever blends more than a couple of materials anyway.
The 2015 Stack Overflow post that this technique traces back to (mirrored at the bottom of this page) phrases it in HLSL. The WebGL2 / Three.js demo uses the GLSL 3.0 equivalent. Same math, different keyword spellings:
HLSL / Adam Miles, SO 2015// pack
float a = 0.45;
float b = 0.55;
uint aS = a * 0xFFFF;
uint bS = b * 0xFFFF;
uint packed = (aS << 16) | (bS & 0xFFFF);
float f = asfloat(packed);
// unpack
uint u = asuint(f);
float a2 = (u >> 16) / 65535.0;
float b2 = (u & 0xFFFF) / 65535.0;
GLSL 3.0 / this demo// pack (JS, into Float32 texture)
function packIntsToFloat(a, b){
_u32[0] = (a << 16) | (b & 0xFFFF);
return _f32[0];
}
// unpack (fragment shader)
void unpackFloat(float p, out float a, out float b){
uint u = floatBitsToUint(p);
a = float(u >> 16u) / 65535.0;
b = float(u & 0xFFFFu) / 65535.0;
}
The HLSL keywords asuint / asfloat are bit-reinterprets: they take 32 bits and re-read them as the other type without changing a single bit. GLSL 3.0 names the same operation floatBitsToUint / uintBitsToFloat. JavaScript doesn't have such operators directly, so the demo does it with a 4-byte ArrayBuffer aliased by both a Uint32Array and a Float32Array (write through one view, read back through the other).
For pure integer pairs (the layer IDs, where you don't want any rescaling) the JS function above is enough. For 0–1 float pairs (the blend weights), one extra step quantizes each to a 16-bit integer first:
JSfunction packFloatsToFloat(a, b){
return packIntsToFloat(Math.floor(a * 65535), Math.floor(b * 65535));
}
The CPU work happens once when the splat map is generated (and again whenever the user re-rolls it). Four steps:
DataArrayTextureAllocate one big Uint8Array sized texRes × texRes × 4 × N, then write each layer's RGBA pixels into its slice. Three.js's DataArrayTexture takes that buffer directly. In the demo the layers are generated procedurally with canvas patterns; in a real game they'd be baked PBR textures loaded from disk.
JSconst arrayData = new Uint8Array(TEX_RES * TEX_RES * 4 * NUM_TEXTURES);
// ...draw each texture into arrayData[t * TEX_RES * TEX_RES * 4 + ...]
const textureArray = new THREE.DataArrayTexture(arrayData, TEX_RES, TEX_RES, NUM_TEXTURES);
textureArray.wrapS = textureArray.wrapT = THREE.RepeatWrapping;
textureArray.generateMipmaps = true;
Before you can extract a top-4 you need to know the weight of every layer at every pixel. Allocate a temporary array sized W × H × N floats (yes it's big: 512 × 512 × 32 = ~32 MB at Float32) and add weight to it splat by splat. The demo uses smoothstep falloff so each brush has a soft edge:
JSconst raw = new Float32Array(MAP_SIZE * MAP_SIZE * NUM_TEXTURES);
// base layer 0 everywhere
for (let i = 0; i < MAP_SIZE * MAP_SIZE; i++) raw[i * NUM_TEXTURES] = 1.0;
for (let s = 0; s < numSplats; s++) {
const cx = Math.random() * MAP_SIZE;
const cy = Math.random() * MAP_SIZE;
const radius = 10 + Math.random() * 50;
const tex = Math.floor(Math.random() * NUM_TEXTURES);
for (let y = ...; y <= ...; y++)
for (let x = ...; x <= ...; x++) {
const dist = Math.hypot(x - cx, y - cy);
if (dist < radius) {
let f = 1 - dist / radius;
f = f * f * (3 - 2 * f); // smoothstep
raw[(y * MAP_SIZE + x) * NUM_TEXTURES + tex] += f * 3.0;
}
}
}
This is the compression. For every pixel, list which layers have any weight at all, sort by weight descending, take the four heaviest, normalize them to sum to 1. The other twenty-eight layers' contributions at that pixel are discarded. That's the same compromise Unity's terrain pipeline makes (and as noted above, you don't notice).
JSfor (let i = 0; i < MAP_SIZE * MAP_SIZE; i++) {
const list = [];
for (let t = 0; t < NUM_TEXTURES; t++) {
const w = raw[i * NUM_TEXTURES + t];
if (w > 0) list.push({ idx: t, weight: w });
}
list.sort((a, b) => b.weight - a.weight);
const top = list.slice(0, 4);
while (top.length < 4) top.push({ idx: 0, weight: 0 });
const sum = top[0].weight + top[1].weight + top[2].weight + top[3].weight;
...
}
Float32 texturePack the four normalized weights into R/G and the four IDs into B/A, using the helpers from the previous section:
JScontrolData[i * 4 + 0] = packFloatsToFloat(top[0].weight / sum, top[1].weight / sum);
controlData[i * 4 + 1] = packFloatsToFloat(top[2].weight / sum, top[3].weight / sum);
controlData[i * 4 + 2] = packIntsToFloat(top[0].idx, top[1].idx);
controlData[i * 4 + 3] = packIntsToFloat(top[2].idx, top[3].idx);
Wrap in a DataTexture with RGBAFormat + FloatType, set filtering to NearestFilter (see the gotcha below), upload. That's the entire CPU side.
Per pixel, the shader does six things: sample the control map, unpack the four weights, unpack the four IDs, sample the texture array four times, multiply each sample by its weight, sum. That's it.
GLSL 3.0 / fragmentprecision highp float;
precision highp int;
precision highp sampler2DArray;
uniform sampler2D controlMap; // the Float32 bit-packed splat
uniform sampler2DArray terrainTextures; // all N textures, 1 binding
uniform float uTiling;
in vec2 vUv;
out vec4 fragColor;
void unpackFloat(float p, out float a, out float b){
uint u = floatBitsToUint(p);
a = float(u >> 16u) / 65535.0;
b = float(u & 0xFFFFu) / 65535.0;
}
void unpackInt(float p, out float a, out float b){
uint u = floatBitsToUint(p);
a = float(u >> 16u);
b = float(u & 0xFFFFu);
}
void main(){
vec4 data = texture(controlMap, vUv);
float w0, w1, w2, w3, i0, i1, i2, i3;
unpackFloat(data.r, w0, w1);
unpackFloat(data.g, w2, w3);
unpackInt(data.b, i0, i1);
unpackInt(data.a, i2, i3);
vec2 tileUv = vUv * uTiling;
vec3 col = texture(terrainTextures, vec3(tileUv, i0)).rgb * w0
+ texture(terrainTextures, vec3(tileUv, i1)).rgb * w1
+ texture(terrainTextures, vec3(tileUv, i2)).rgb * w2
+ texture(terrainTextures, vec3(tileUv, i3)).rgb * w3;
fragColor = vec4(col, 1.0);
}
The four texture() calls all sample the same physical binding (the texture array), just at different z indices. Total sampler-slot cost: 2 (one for the control map, one for the array). Same fragment shader scales from 4 layers in the array to 4,000 with zero change.
Trap: you cannot let the GPU linearly interpolate a bit-packed texture. If it blends a pixel containing layer ID 5 with a neighbouring pixel containing layer ID 27, the bits in between are not a valid packed pair; they decode to garbage like ID 16 and ID 3, neither of which were actually painted there.
Fix is one line on the JS side:
JScontrolMap.magFilter = controlMap.minFilter = THREE.NearestFilter;
Nearest-neighbour sampling snaps to the nearest control-map pixel and reads its bits intact. The catch: you lose smooth blending across control-map pixel boundaries, so material transitions look blocky if the control map is low-resolution.
The honest fix (used in shipping AAA terrain shaders): the fragment shader manually fetches the four nearest control-map texels itself, unpacks each one independently, samples the texture array for all of them, and does the bilinear blend in math at the very end. More work in the shader, but the visual smoothness comes back. The demo on this page doesn't bother because the control map is 512² over a 200-unit plane; the pixels are small enough that you'd have to zoom in hard to see the seams.
32 procedurally-generated textures, 200 random paint splats, top-4 extraction per pixel, packed into a 512² RGBA / Float32 control map. One sampler binding holds all 32 textures; one binding holds the splat map. The fragment shader does the same six operations as the snippet above.
Drag to orbit. Hover the right-hand palette to isolate a layer, click to lock isolation. Toggle Show layer indices to swap every texture for a giant index number (useful for tracing which IDs the bit-pack landed on).
Thanks to Duck for hitting the 16-sampler wall in public, to Nukadelic for explaining how he side-stepped the same wall on Quest VR, and to Adam Miles for the 2015 Stack Overflow answer (mirrored below) that the whole thing is built on. Three.js / WebGL2 port and page by Tront.
The post Nukadelic linked is the ancestor of this whole technique. Mirrored verbatim below: title, both answers (Adam Miles' accepted high-vote one and SmugLispWeenie's counterpoint), the comment thread, scores, bylines. Stripped of nav clutter and the unrelated sidebar.
In HLSL, how would I go about packing two floats within the range of 0–1 into one float with an optimal precision. This would be incredibly useful to compress my GBuffer further.
Miguel P, Oct 2 2015 at 20:55
// Packing float a = 0.45; float b = 0.55; uint aScaled = a * 0xFFFF; uint bScaled = b * 0xFFFF; uint abPacked = (aScaled << 16) | (bScaled & 0xFFFF); float finalFloat = asfloat(abPacked); // Unpacking float inputFloat = finalFloat; uint uintInput = asuint(inputFloat); float aUnpacked = (uintInput >> 16) / 65535.0f; float bUnpacked = (uintInput & 0xFFFF) / 65535.0f;
Adam Miles, Oct 3 2015 at 11:06
Converting floating point numbers to fixed point integers is an error prone idea, due to floats covering much larger magnitudes. Say unpacking sRGB will give you pow(255, 2.2) values, which are larger than 0xffff, and you will need several times that amount for robust HDR. Generally fixed point code is very fragile, obfuscated and a nightmare to debug. People invented floats for a good reason.
There are several 16-bit float formats. IEEE 16-bit float one is optimized for numbers between -1.0 to 1.0, but also support numbers up to 0x10000, just in case you need HDR, still so you will need to normalize your larger floats for it. Then there is bfloat16, which behaves like normal 32-bit float, just with less precision. IEEE 16-bit floats are widely supported by modern CPUs and GPUs, and can also be converted quickly even in software. bfloat16 is just gaining popularity, so you will have to research if it is suitable for your needs. Finally you can introduce your own 16-bit float format, using integer log function, which is provided by most CPUs as a single instruction.
SmugLispWeenie, Aug 20 2019 at 20:46
floatBitsToUint: the bit-reinterpret primitive that makes shader-side unpacking work.