WebGL physics

In my first WebGL/three.js post, I used the GPU to display the Mandelbrot set: I set attributes at each corner of two triangles to hold the real and imaginary components of their location in the complex plane, and used a fragment shader to see if each pixel in between was in the Mandelbrot set or not. This is fine as far as it goes, but it doesn't go very far. It'd be nice to do calculations in the shader and have the shader function return a value, or to update global variables that can be accessed in the JavaScript or in the shader on the next iteration, instead of just colouring pixels on the screen.

The way forward is to render to an undisplayed buffer instead of the screen. That buffer can be accessed in the JavaScript to give an array of pixel values (RGBA 0-255), and it can also be used as an input texture to another shader (there's one tiny wrinkle in doing this with three.js; I am using r81 and I think it was different in r50ish, so watch out if you Google your way to some old examples). Usually textures are regular pictures (e.g., this wooden cube), but in this post I'll be using each pixel as a variable, and do some physics with it. Useful Google searches are 'render-to-texture', and 'framebuffer'; I used this post (pure WebGL, not three.js) as a guide early on.

(There's an existing framework for GPU calculations in three.js, as used in this example. It might be substantially better than what I've done, but I haven't studied it at all, because I only discovered it after I was well committed to my method.)

The main application for doing physics in WebGL is for writing computer games (or just fun demos), and that is where I'd like to go eventually. But as a first test of WebGL's capabilities (and my capabilities with WebGL), I decided to try to reproduce Figure 4 from this obscure 2008 paper on the Ising model, since that calculation – integrating 1000 trajectories of a system of 100 stochastic differential equations – is extremely well-suited to GPU parallelisation. To try to keep my audience above 2 (I'm optimistic), the first few sections of this post will describe general principles, and the statistical mechanics left to the end.

Doing serious calculations, with no fancy graphic output, is not the main use case for WebGL, and it is definitely not the best way to do GPU physics. I say this without knowing anything about OpenCL or CUDA – whatever is involved in those frameworks, it can't be as annoying and inconvenient as WebGL (at least, as annoying and inconvenient as the methods I thought of in WebGL). WebGL is based on a now-quite-old version of OpenGL (the idea is that WebGL should be able to run on as many devices as possible), and there were a few times when working on this that I wished I could use the more recent OpenGL functions that Google was showing me.

Some constraints on WebGL programming:

The maximum precision for variables is as 32-bit floats (single precision). This is usually about seven significant figures in base 10; enough for my current purposes but it'd be nice more generally to have double precision.
All of the parallelised outputs are to pixel values in some buffer in memory. There's a (device-dependent) limit on how wide and tall this buffer can be, so you have to arrange large 1D arrays into 2D rectangles of pixels, and then keep track of which row/column you need to be in.
Textures can be accessed pixel-by-pixel, but the texture coordinates are not integers, but instead are floats on a scale from 0 to 1: the left edge of the first pixel in a row is at x = 0 and the right edge of the last pixel in a row is at x = 1.
You can read and write the RGBA channels of each pixel, and each of those channels is an 8-bit unsigned integer (well, a float between 0 and 1, but really it's an integer from 0 to 255). That's 32 bits per pixel, which means that one 32-bit float can be exactly represented in one pixel. But, there are no bit-wise operators, and no floatBitsToUint() or uintBitsToFloat(), so you need your own functions to do the conversions.
There is no random number generator.

Still, and in addition to my end goal of writing a web-based computer game, doing a calculation from my Master's thesis with tools designed for scientific computing would be just re-doing the nearly-decade-old calculation on better hardware, and this would be utterly boring. Instead, I want to reproduce the most interesting calculation from my thesis in a web browser, which is funny. And it works! I wrote an implementation in C, and Chrome beats it by a factor of about 7 or 8. (My C code runs on a single core, and could probably be optimised. But the braindead programmer who wrote the C code is the same braindead programmer who wrote the GPU shaders, so there is at least some practical relevance to the comparison, for programmers at my level of ability.)

On this page:

Buffers and textures
Converting floats to/from pixels
Lagged Fibonacci Generator
10 × 10 Ising model

A texture is an image that the fragment shader can read. Usually you'd put a texture on the face of an object, or (say) around a sphere. For most regular applications, the image should have sides of length a power of 2.

Rather than starting with a pre-made image, I instead want to create images pixel-by-pixel, storing appropriate values in each (the mapping between pixels and floats is described in the next section). You can do this by creating a canvas element, which allows pixel manipulation on the RGBA channels with the getImageData() and putImageData() methods of the canvas's 2D context.

If you just want a pretty picture, this amount of explanation would be fine. But I want each of the four channels to be exactly what I set it to, because these pixels are storing variables, and it turns out that when you use putImageData() with non-trivial values in the alpha channel, the browser will often make an approximation to the red, green, and blue channels (and these approximations vary by browser!!!!), so that when you getImageData(), you get slightly different numbers. (The term to Google is 'premultiplied alpha'.) It's annoying at best to have many of my values off by some fraction of a percent, and it would be disastrous if one of the altered colours flipped what I was reading as the sign bit of the floating point value the pixel was storing.

The solution is to have create two textures, both always having the alpha channel at 255: something like (R, G, B, 255) in the first texture and (A, A, A, 255) in the second. Fortunately the output of the shaders is pixel-perfect, so once the two initial textures have been combined, only one texture is needed to store all four channels.

The basic building blocks of the calculations are squares running (in world space) from -1 to 1 in x and y, with z = 0. These are created with THREE.PlaneBufferGeometry(), which is combined with a THREE.ShaderMaterial() to create a THREE.Mesh(), which can be added to a THREE.Scene() and then rendered. For a texture to be read by a shader, it must be a uniform of the shader material. The critical bit of the setup in the JavaScript is at the end of the following code excerpt; it shows the initialisation of the random number generator, and includes drawing the intial seeds to canvas (a more complete description of the RNG is in the next section).

function init_rng_texture(lfg_r, lfg_s, num_rand_cols, num_rows) {
  // Initialises and returns canvases (rgb and alpha) for the
  // Lagged Fibonacci generator.
  
  var i, j, k;
  var canvas = document.createElement("canvas");
  var context = canvas.getContext("2d");
  var width = (lfg_r + 1) * num_rand_cols;
  var height = num_rows;
    
  canvas.width = width;
  canvas.height = height;
  
  context.fillStyle = "#000000";
  context.fillRect(0, 0, width, height);
  
  var canvas_alpha = document.createElement("canvas");
  var context_alpha = canvas_alpha.getContext("2d");
    
  canvas_alpha.width = width;
  canvas_alpha.height = height;
  
  context_alpha.fillStyle = "#000000";
  context_alpha.fillRect(0, 0, width, height);
  
  var pixel_data = context.getImageData(0, 0, width, height);
  var pixels = pixel_data.data;
  
  var pixel_data_alpha = context_alpha.getImageData(0, 0, width, height);
  var pixels_alpha = pixel_data_alpha.data;
  
  var ct = 0;
  var this_pixel, r;
  
  for (i = 0; i < num_rows; i++) {
    for (j = 0; j < num_rand_cols; j++) {
      // lfg_r initial values in the "array":
      for (k = 0; k < lfg_r; k++) {
        r = Math.random();
        
        // This was useful for debugging, comparing my JS
        // implementation to C:
        //r = Math.abs(Math.sin(k + 1));
        
        
        this_pixel = value_to_pixel(r);
        
        pixels[4*(i*width + (lfg_r + 1)*j + k) + 0] = this_pixel[0];
        pixels[4*(i*width + (lfg_r + 1)*j + k) + 1] = this_pixel[1];
        pixels[4*(i*width + (lfg_r + 1)*j + k) + 2] = this_pixel[2];
        pixels[4*(i*width + (lfg_r + 1)*j + k) + 3] = 255;
        
        // alpha channel goes into a colour channel of a separate map to
        // avoid problems with pre-multiplied alpha.
        pixels_alpha[4*(i*width + (lfg_r + 1)*j + k) + 0] = this_pixel[3];
        pixels_alpha[4*(i*width + (lfg_r + 1)*j + k) + 1] = this_pixel[3];
        pixels_alpha[4*(i*width + (lfg_r + 1)*j + k) + 2] = this_pixel[3];
        pixels_alpha[4*(i*width + (lfg_r + 1)*j + k) + 3] = 255;
      }
      
      // The final pixel points to the pixel to read; starts at
      // zero, with the 0.1 added to avoid floating point issues
      // and the floor() function in the shader:
      this_pixel = value_to_pixel(0.1);
      
      pixels[4*(i*width + (lfg_r + 1)*j + lfg_r) + 0] = this_pixel[0];
      pixels[4*(i*width + (lfg_r + 1)*j + lfg_r) + 1] = this_pixel[1];
      pixels[4*(i*width + (lfg_r + 1)*j + lfg_r) + 2] = this_pixel[2];
      pixels[4*(i*width + (lfg_r + 1)*j + lfg_r) + 3] = 255;
      
      pixels_alpha[4*(i*width + (lfg_r + 1)*j + lfg_r) + 0] = this_pixel[3];
      pixels_alpha[4*(i*width + (lfg_r + 1)*j + lfg_r) + 1] = this_pixel[3];
      pixels_alpha[4*(i*width + (lfg_r + 1)*j + lfg_r) + 2] = this_pixel[3];
      pixels_alpha[4*(i*width + (lfg_r + 1)*j + lfg_r) + 3] = 255;
    }
  }
  
  context.putImageData(pixel_data, 0, 0);
  context_alpha.putImageData(pixel_data_alpha, 0, 0);
  
  return {"canvas_map": canvas, "canvas_alpha_map": canvas_alpha};
}

var rng_canvas_obj = init_rng_texture(gl_all.lfg_r, gl_all.lfg_s, gl_all.num_rand_cols, gl_all.rng_rows);
var rng_canvas = rng_canvas_obj.canvas_map;
var rng_canvas_alpha = rng_canvas_obj.canvas_alpha_map;

var rng_texture_init = new THREE.Texture(rng_canvas);
rng_texture_init.needsUpdate = true;
rng_texture_init.minFilter = THREE.NearestFilter;
rng_texture_init.magFilter = THREE.NearestFilter;

var rng_texture_init_alpha = new THREE.Texture(rng_canvas_alpha);
rng_texture_init_alpha.needsUpdate = true;
rng_texture_init_alpha.minFilter = THREE.NearestFilter;
rng_texture_init_alpha.magFilter = THREE.NearestFilter;

var material_rng_init = new THREE.ShaderMaterial({
  "uniforms": {
    "texture": {"type": "t", "value": rng_texture_init},
    "alpha":   {"type": "t", "value": rng_texture_init_alpha},
  },
  "vertexShader":   document.getElementById("shader_vertex_default").textContent,
  "fragmentShader": document.getElementById("shader_fragment_texture_alpha").textContent
});

var plane = new THREE.PlaneBufferGeometry(2, 2);
var quad = new THREE.Mesh(plane, material_rng_init);

var scene_rng_init = new THREE.Scene();
scene_rng_init.add(quad);

In the above, there are two uniforms – one for the RGB texture and one for the alpha texture – and each has type "t" for "texture". What about the shaders? All of the materials in this page have the same vertex shader, whose only functions are to generate the gl_Position (which is compulsory anyway) and to pass so-called uv-coordinates to the fragment shader. The uv coordinates are defined so that when in the fragment shader, you can locate the current pixel on a scale of 0 to 1 in both x- and y-directions; this 0 to 1 scale is also what is needed to access pixels from the texture. A variable is passed from the vertex shader to the fragment shader by declaring a varying; three.js helpfully calculates the uv attribute for us, and I call the varying uv2 (because it's a vec2, and because for a while I was using vec4's for the same purpose).

varying vec2 uv2;

void main() {
  uv2 = uv;
  gl_Position = projectionMatrix * modelViewMatrix * vec4(position, 1.0)
}

varying vec2 uv2;
uniform sampler2D texture;
uniform sampler2D alpha;

void main() {
  gl_FragColor = vec4(texture2D(texture, uv2).rgb, texture2D(alpha, uv2).r);
}

That fragment shader, and all the others described in this post, needs to output its pixels to an offscreen buffer if it is to be used for any calculations. In three.js, such a buffer is created with THREE.WebGLRenderTarget(width, height, options). The first two arguments are in pixels; options is an object holding properties of the texture that gets created, and these properties are the same as for any other texture in three.js. I set the minFilter and magFilter both to THREE.NearestFilter; I don't know if this is strictly necessary (usually the textures won't need to be magnified or miniaturised), but I certainly don't want the shaders to do any averaging across pixels. It's also handy if for debugging purposes you want to render a small buffer to the screen – with the NearestFilter, the pixels of the texture can get enlarged for easier visual inspection.

gl_all.rng1_buffer = new THREE.WebGLRenderTarget(
  gl_all.rng_width,
  gl_all.rng_rows,
  {"minFilter": THREE.NearestFilter, "magFilter": THREE.NearestFilter, "format": THREE.RGBAFormat});

That buffer's texture is generated automatically by three.js and is accessed by (in this case) gl_all.rng1_buffer.texture. The idea is to create a ShaderMaterial which has that texture as a uniform, and whose fragment shader does calculations with it. But since most such calculations will involve updating the texture in some way, and since the various pixels' calculations are done in parallel, there needs to be two buffers – one to hold the input texture, and one to send the output to. You can't just go editing the pixels on the fly when you might need the original pixels elsewhere.

So I create an otherwise identical buffer called gl_all.rng2_buffer (and this explains the '1' in the buffer name above), and use its texture as a uniform for my fragment shader; the order of 1 and 2 is of course irrelevant. Once the shader has taken buffer 2's texture and written the update to buffer 1, I swap the buffers so that they're ready for the next loop. Details of the shader are left to later in the page, and the following is just what it looks like in the JavaScript.

function swap_buffers(container, base_name) {
  // Swaps the buffers, assumes some regularity in the names:
  var b1 = base_name + "1_buffer";
  var b2 = base_name + "2_buffer";
  
  var temp_buffer = container[b1];
  container[b1] = container[b2];
  container[b2] = temp_buffer;
  
  // Not exactly general-purpose code, but useful for this project,
  // updating the uniform with the newly swapped buffers.
  if (base_name == "rng") {
    gl_all.material_rng.uniforms.rng_texture.value = gl_all.rng2_buffer.texture;
  }
}


// The material for the RNG, uses rng2_buffer as a texture.
gl_all.material_rng = new THREE.ShaderMaterial({
  "uniforms": {"rng_texture": {"type": "t", "value": gl_all.rng2_buffer.texture}},
  "vertexShader":   document.getElementById("shader_vertex_default").textContent,
  "fragmentShader": shader_aux_fns + document.getElementById("shader_fragment_rng").textContent
});

quad = new THREE.Mesh(plane, gl_all.material_rng);
gl_all.scene_rng = new THREE.Scene();
gl_all.scene_rng.add(quad);


// Initial render (I haven't defined the renderer or camera in the
// visible part of this post, but it's all standard).
// Render is to rng1_buffer, which then gets swapped to rng2_buffer.
gl_all.renderer.render(scene_rng_init,  gl_all.camera, gl_all.rng1_buffer,  true);
swap_buffers(gl_all, "rng");

// ...lots of other code might go here...

// Loop:
for (var i = 0; i < 1000; i++) {
  // Update the random numbers:
  gl_all.renderer.render(gl_all.scene_rng, gl_all.camera, gl_all.rng1_buffer, true);
  swap_buffers(gl_all, "rng");
  
  // ...do something with the random numbers...
}

The shaders read and write pixel values on each of the red, green, blue, and alpha channels. Each channel is an 8-bit unsigned integer, i.e., 0-255, except that this is then scaled to a float variable from 0 to 1. Since single-precision floating point uses 32 bits to store numbers, the four channels are enough to pack all of the bits of a float into a pixel. (Some devices – older ones, I assume, but I haven't tested – don't support 32-bit floats in shaders, and I don't support such devices.)

Someone asked how to do this conversion on Stack Overflow a few years ago, and got this moderately upvoted answer, which when I tried to adapt it gave results that were off by up to 0.3%. I don't know if it was an error in the SO answer or an error introduced by me (certainly their code is not clean), but in any case I sat down with a pen, paper, and the relevant Wikipedia article, and wrote my own functions. (OpenGL from version 3.30 has functions that would make this easier.) I am somewhat surprised that my code works as often as it does; I haven't seen any errors converting back and forth even though I expected to see some caused by the shader's conversion of the unsigned int to a float.

The IEEE 754 standard has one bit for the sign, then 8 bits for the exponent, then 23 bits for the mantissa. (I re-ordered that so that the 8-bit exponent would go entirely into the alpha channel, thereby avoiding a calculation or two.)

If the n-th binary digit of the mantissa is \( b_n \), then the mantissa is defined as

vec4 value_to_pixel(float x) {
  if (x == 0.0) { return vec4(0.0, 0.0, 0.0, 0.0); }
  
  float s = 0.0;
  if (x < 0.0) {
    x *= -1.0;
    s = 1.0;
  }
  
  float lb_x = floor(log2(x));
  float e = 127.0 + lb_x;
  float m = x/exp2(lb_x) - 1.0;
  
  m *= 128.0;
  float r = floor(m) + 128.0*s;
  
  m -= r - 128.0*s;
  m *= 256.0;
  
  float g = floor(m);
  
  m -= g;
  m *= 256.0;
  
  float b = floor(m);
  
  return vec4(r/255.0, g/255.0, b/255.0, e/255.0);
}

float pixel_to_value(vec4 p) {
  if (p.r + p.g + p.b + p.a == 0.0) { return 0.0; }
  
  float s = 1.0;
  
  float r = p.r * 255.0;
  float g = p.g * 255.0;
  float b = p.b * 255.0;
  float e = p.a * 255.0;
  
  if (r >= 128.0) {
    s = -1.0;
    r -= 128.0;
  }
  
  float m = r/128.0 + g/32768.0 + b/8388608.0;
  
  return s*(1.0 + m)*exp2(e - 127.0);
}

GLSL doesn't provide a random number generator, which appears to have been a cause for some frustration given the number of upvotes given to this old Stack Overflow answer. For many graphics applications, the purpose of the random numbers is simply to generate a noisy-looking background, and the answer proposed in the link (multiplying the coordinates by a large number, then taking the absolute value of the sine) is fine. But I want a steady stream of decent-enough standard Gaussian random numbers for use in stochastic DE solving, so I need something a little more sophisticated.

I am almost certainly re-inventing a wheel here, but existing libraries (example) look intimidating to me and I don't even know if they do what I want. So I dug up my fourth-year computational physics notes and read over the chapter on pseudo-random number generators, and decided to write a Lagged Fibonacci Generator.

An LFG is usually defined over integers modulo some power of 2, but my PHYS4070 notes assure me that I can use floating point numbers on [0, 1]. Two parameters are required, which I will call (following my notes!) \( r \) and \( s \lt r \), and which need to be chosen carefully to ensure that the RNG has a large period. After defining \( r \) initial seeds (all in [0, 1] as I am implementing it), subsequent random numbers are defined by \( x_n = x_{n - r} + x_{n - s}\, (\text{mod } 1) \), though the 'plus' can be changed to any of several other operators.

For whatever reason, SDE solving doesn't require great random numbers, and using \( (r, s) = (17, 5) \) gives satisfactory results even though parameters like 17 and 5 look awfully small. I want to have \( r \) fairly small, because the LFG requires storing the previous \( r \) values, and I don't want my RNG texture to be gigantic – in the Ising model calculation, I have 200,000 separate LFG's running, which means at least 3.4 million pixels in the texture. That doesn't push the limits of what's possible (and if necessary I could always split up the textures), especially on desktop computers, but I didn't want to go overboard, and I wanted it to run reasonably quickly.

A simple way to make the LFG work (thankyou PHYS4070) is to have an "array" of the 17 current random numbers, and then use another variable to point to which entry is the current \( n \). At each iteration, entry \( n \) is updated, and the pointer increments, wrapping around to the start if it reaches the end of the array.

Following this logic, I actually used 18 pixels per LFG (this is probably wasteful – all the LFG's get updated at the same time, so their pointers should always be the same). I think pretty much every device can handle textures up to 2048 × 2048 pixels, so I put 100 LFG's in each row of the texture, and made 2000 rows. This means that the first step in the shader will be locating which LFG the current pixel is in. After that, it reads the pointer pixel for the current LFG, and if the current pixel needs to be updated, it does the update. Since the subsequent transformation to standard Gaussians requires two random numbers, I do two iterations of the LFG instead of one. The pixel_to_value() and value_to_pixel() functions are not shown below but need to be defined in the shader (in the JavaScript I store the string holding these functions separately and prepend it to the shader script elements in the HTML).

After many floating point mishaps combined with either floor() or mod() – the uv2 vec2 gives the current pixel coordinates on a 0-1 scale – I got paranoid about making sure I was calculating integers correctly, and frequently started adding 0.1 to values that are supposed to be integer (and subsequently taking the floor). This code feels quite complicated, for something so simple, and I can only assure my future self that it will make sense after careful study.

uniform sampler2D rng_texture;

varying vec2 uv2;

#define lfg_r 17.0
#define lfg_s 5.0

#define num_rand_cols 100.0
#define height 2000.0

const float width = num_rand_cols * (lfg_r + 1.0);
const float text_du = 1.0 / width;
const float text_dv = 1.0 / height;
const float text_tol = 0.25 / width;

void main() {
  // Default to returning the current pixel; only three pixels
  // in each LFG need updating.
  gl_FragColor = texture2D(rng_texture, uv2);
  
  // Which LFG column are we in?
  float n_pixel = floor(uv2.x * width);
  float col = floor((n_pixel + 0.1) / (lfg_r + 1.0));
  
  // Which pixel is being pointed at?
  float ptr_px_u = ((lfg_r + 1.0)*col + lfg_r + 0.5) / width;
  float ptr_px = pixel_to_value(texture2D(rng_texture, vec2(ptr_px_u, uv2.y)));
  
  float ptr_px_lagged = ptr_px + lfg_s;
  if (ptr_px_lagged > lfg_r) { ptr_px_lagged -= lfg_r; }
  
  float ptr_px_next = ptr_px - 1.0;
  if (ptr_px_next < 0.0) { ptr_px_next += lfg_r; }
  
  float ptr_px_lagged_next = ptr_px_next + lfg_s;
  if (ptr_px_lagged_next > lfg_r) { ptr_px_lagged_next -= lfg_r; }
  
  // Use floor() because of the 0.1 added to the pointer integer.
  float ptr = floor(ptr_px) * text_du;
  float ptr_lagged = floor(ptr_px_lagged) * text_du;
  float ptr_next = floor(ptr_px_next) * text_du;
  float ptr_lagged_next = floor(ptr_px_lagged_next) * text_du;
  
  float u_base = ((lfg_r + 1.0)*col + 0.5) / width;
  
  if (abs(u_base + ptr - uv2.x) < text_tol) {
    // First random number to be generated
    float x_n      = pixel_to_value(texture2D(rng_texture, vec2(u_base + ptr, uv2.y)));
    float x_lagged = pixel_to_value(texture2D(rng_texture, vec2(u_base + ptr_lagged, uv2.y)));
    
    x_n += x_lagged;
    if (x_n > 1.0) { x_n -= 1.0; }
    
    gl_FragColor = value_to_pixel(x_n);
  } else if (abs(u_base + ptr_next - uv2.x) < text_tol) {
    // Second random number to be generated
    float x_n      = pixel_to_value(texture2D(rng_texture, vec2(u_base + ptr_next, uv2.y)));
    float x_lagged = pixel_to_value(texture2D(rng_texture, vec2(u_base + ptr_lagged_next, uv2.y)));
    
    x_n += x_lagged;
    if (x_n > 1.0) { x_n -= 1.0; }
    
    gl_FragColor = value_to_pixel(x_n);
  } else if (abs(ptr_px_u - uv2.x) < text_tol) {
    // Pixel storing the pointer.
    ptr_px -= 2.0;
    if (ptr_px < 0.0) { ptr_px += lfg_r; }
    
    gl_FragColor = value_to_pixel(ptr_px);
  }
}

A Box-Muller transform takes two random numbers uniform on [0, 1] and outputs a standard Gaussian. (Here, as I have been throughout this post, I am being very loose with square and round brackets on my intervals.) In terms of shader and buffer structure, this transform is the easiest in the whole Ising model calculation, because the Box-Muller shader doesn't need to read its own values – only those of the LFG – and so there only needs to be one WebGLRendererTarget for it, and no swapping of buffers.

The shader code itself requires a bit of careful algebra. The 100 columns and 2000 rows of LFG's are re-arranged to a texture of width 500 pixels and height 400 pixels, so there's a bit of rigmarole to work out which LFG to read given the current pixel location. Since the LFG pointer pixel had 2 subtracted from it in the LFG shader (to be ready for the next LFG iteration), the Box-Muller shader adds the 2 back so that it points at the first of the two most-recently-generated random numbers. Once again there are many signs of my earlier floating point versus integer bugs.

uniform sampler2D rng_texture;

varying vec2 uv2;

#define tau 6.283185307179586

#define lfg_r 17.0
#define lfg_s 5.0

#define num_rand_cols 100.0
#define height 2000.0

#define output_width 500.0
#define output_height 400.0

const float width = num_rand_cols * (lfg_r + 1.0);
const float text_du = 1.0 / width;
const float text_dv = 1.0 / height;

void main() {
  float output_i = floor(uv2.y * output_height);
  float output_j = floor(uv2.x * output_width);
  
  float N = output_i * output_width + output_j;
  
  float input_i = floor((N + 0.1) / num_rand_cols);
  float input_j = floor(mod(N + 0.1, num_rand_cols));
  
  float input_v = (input_i + 0.5) / height;
  float u_base = ((lfg_r + 1.0)*input_j + 0.5) / width;
  
  float ptr_px_u = ((lfg_r + 1.0)*input_j + lfg_r + 0.5) / width;
  
  float ptr1_px = pixel_to_value(texture2D(rng_texture, vec2(ptr_px_u, uv2.y))) + 2.0;
  if (ptr1_px > lfg_r + 0.05) { ptr1_px -= lfg_r; }
  float ptr1 = floor(ptr1_px) * text_du;
  
  float ptr2_px = ptr1_px - 1.0;
  if (ptr2_px < 0.0) { ptr2_px += lfg_r; }
  float ptr2 = floor(ptr2_px) * text_du;
  
  float r1 = pixel_to_value(texture2D(rng_texture, vec2(u_base + ptr1, input_v)));
  float r2 = pixel_to_value(texture2D(rng_texture, vec2(u_base + ptr2, input_v)));
  
  // This doesn't feel like best practice:
  if (r1 == 0.0) { r1 = 1.0e-6; }
  
  gl_FragColor = value_to_pixel(sqrt(-2.0*log(r1)) * cos(tau*r2));
}

The goal of this section is to reproduce – live, in your browser, should you want to eat up some computer resources – Figure 4 from this paper, which calculates nearest-neighbour spin-spin correlations \( \langle \hat{\sigma}^z_j \hat{\sigma}^z_k \rangle \) for a 10 × 10 Ising model. The technique is quite a lot more computationally efficient than I thought at the time, since it turns out that I was using a much smaller step-size than necessary – my log book tells me I used \( \Delta t = 0.00125 \), and on this page I've used \( \Delta t = 0.05 \) with no visible loss of accuracy in the ensemble averages.

(How do you know what the right time step is in SDE solving, anyway? You can't halve the step and compare the difference between one big step and two little steps, because the second little step will have different noise terms. I can see why 2007-me thought that 0.05 was too large – the noise term is a standard Gaussian random variable multiplied by sqrt(0.05) ≈ 0.22, which feels awfully big for something that's supposed to approximate an infinitesimal.)

That the calculation doesn't need to be as intensive as I made it in 2007 means that its WebGL version is less impressive than I initially thought. It runs about 8 times faster than my C code (which is not nothing!), but it felt at least a hundred times faster than I remembered when I first got it working, because my time step was giving me a 40x speed-up straight off the bat.

For the gory details, you can read the paper^*; it seems to have picked up a couple of citations and my very brief skim suggests that Ng and Sørensen is the most interesting along similar lines (the cite comes right near the end of the conclusion, and I wonder if I can thank a referee for it).

^*Figure 2 is mis-captioned and mis-described in the text; that graph is for a one-dimensional lattice with N=50, a much more impressive calculation than what it purports to be. It's a measure of how copmletely uninterested I had become with what was supposed to be my PhD that I either didn't even read the paper that has my name as first author, or that I did read it and cared so little that I let that inaccuracy slide as I instead went to teach English in France.

And while we're here, the caption to Figure 4 says it involved 1000 trajectories, but my log book and thesis both say 850.

Long story short, let \( R_j \) be defined at each lattice point, let angle brackets denote neighbouring lattice points, let \( \beta = 1/T \) be the inverse temperature (Boltzmann's constant is equal to 1), \( J \) the coupling strength between neighbouring lattice sites, and \( \xi_{jk}(t) \) be delta-correlated noise terms that exist for every pair of coupled lattice sites and whose indices are unordered. Then if the \( R_j \) are evolved according to

then the distribution of the \( R_j \) over an ensemble of trajectories will converge so that in steady-state, the spin-spin correlations at the given temperature can be derived by taking ensemble averages of products of the \( \tanh(R_j) \). That is, \( \langle \hat{\sigma}^z_j \hat{\sigma}^z_k \rangle = \langle \tanh(R_j) \tanh(R_k) \rangle \), where the left-hand-side angle brackets denote a quantum expectation value and the right-hand-side angle brackets denote an ensemble average.

(You can throw in a magnetic field term as well if you want. Since the noise terms are multiplied only by constants, the Itô and Stratonovich forms of the equations are identical.)

This is exceptionally well-suited to a WebGL calculation. Trajectories evolving independently are inherently parallelisable anyway, but furthermore each \( R_j \)'s evolution only depends on itself and its four neighbours – there doesn't need to be a loop over every lattice site, which in WebGL terms means that there doesn't need to be a loop over a relatively large number of pixels, or indeed any loop at all.

Below is correlation versus temperature graph for the infinite 2D Ising model; click on 'Start calculation' to start it running, and it should work its way through seven different values of \( \beta \) for the 10 × 10 case. The critical temperature is at \( \beta \approx 0.44 \), and the discrepancy between the infinite lattice and the finite lattice is quite large there. It takes less than 10 minutes on my laptop in Chrome to get through all of the points; Firefox is a little quicker but it also makes the rest of the computer less responsive. Internet Explorer reports one hundred errors to the JavaScript console before it stops bothering.

The nearest-neighbour spin-spin correlations are averaged over all neighbouring pairs of lattice sites. It calculates 100 time steps before calculating the statistics (the latter calculations are in the JavaScript and are slow compared to the GPU part of the calculation). It guesses that the system has converged if the correlation is lower than it was 20 reporting steps ago, and then it runs another 10 reporting steps so as not to bias the result low. This choice is overly conservative at high temperatures, where it converges quickly, but convergence at low temperature is slow, and it might end the calculation early with an under-estimate of the correlation.

There is no time-averaging after it reaches convergence; I can't be bothered working out how long it takes for stats reports to be independent of one another. The scatterplot points are drawn as red squares all of the same size; "standard errors" are given in the table below the plot, and I've put the term in scare quotes because in calculating them, I assumed that each pair-wise correlation is independent of the others, which seems a very dubious assumption.

The code uses requestAnimationFrame() (one call per reporting step), so calculations will pause if you change tabs. Calculations may then resume if you return to this tab, but the browser might equally have thrown away something important from the memory, with subsequent calculations returning zero.

I don't recommend running it on mobile – one person has told me that it makes their iPhone 6 quite hot, and it gave wrong answers anyway. It also exhibits bizarre behaviour on my Galaxy S3.

The SDE's are integrated using the semi-implicit method. A semi-implicit step-forward from an initial vector of values \( \mathbf{R}_n \) to \( \mathbf{R}_{n+1} \) goes like this (\( f(\mathbf{R}) \) is the RHS of the SDE system):

\( \mathbf{R}^* = \mathbf{R}_n \)
Loop:
- \( \mathbf{\Delta R} = f(\mathbf{R}^*) \)
- \( \mathbf{R}^* = \mathbf{R}_n + \frac{1}{2} \mathbf{\Delta R} \)
\( \mathbf{R}_{n+1} = \mathbf{R}_n + \mathbf{\Delta R} \) (equivalently, \( \mathbf{R}_{n+1} = 2\mathbf{R}^* - \mathbf{R}_n \))

The above sketch doesn't specify how many of those iteration loops to do; it's supposed to be enough so that it converges, and I chose 3 for my code without bothering to check what the error is.

There are three distinct steps in the above procedure, and each gets its own fragment shader (and material in the JavaScript). First the current variable values \( \mathbf{R}\, (= \mathbf{R}_n) \) have to be copied into the intermediate values \( \mathbf{R}^* \), then \( \mathbf{R}^* \) is updated (inside a JavaScript loop) using \( \mathbf{R} \), and then finally the new values \( \mathbf{R}\, (= \mathbf{R}_{n+1}) \) are updated using \( \mathbf{R}^* \).

The only tricky shader to write is the semi-implicit iteration. Each \( R_j \) equation has terms coming from itself, each of its four neighbouring \( R_k \) values, and a noise term from each of its four coupling links. The fragment shader is doing this for each of 1000 trajectories, and so the current pixel needs to be carefully located so that it talks to the correct noise and variable terms. The 100 lattice sites for each trajectory are all stored in a single row of the variables texture, so the "up" and "down" nearest neighbours increment 9 pixels horizontally (or 89 pixels, if wrapping around the top/bottom row).

There are two noise terms per lattice site (horizontal and vertical), and the Gaussian RNG texture handles these by having a height double that of the variables texture, with horizontal link noises in the top half and vertical link noises in the bottom half.

Once again there are relics of my earlier floating-point-induced bugs, and I don't know how many of the 0.1's etc. are necessary.

varying vec2 uv2;
uniform sampler2D vars_intermediate;
uniform sampler2D vars_old;
uniform sampler2D rng;
uniform float dt;
uniform float sqrt_dt;
uniform float beta;
uniform float J;

#define N_side 10.0

#define output_width 500.0
#define output_height 200.0

const float N = N_side * N_side;
const float input_du = 1.0 / output_width;
const float input_dv = 1.0 / output_height;

float tanh(float x) {
  return (exp(x) - exp(-x)) / (exp(x) + exp(-x));
}

void main() {
  // Need to know where we are horizontally in the fragment so that
  // the boundary lattice sites can be handled properly, instead of
  // coupling to the next trajectory or to a blank value off the end
  // of the texture.
  float output_j = floor(output_width * uv2.x);
  
  float j = floor(mod(output_j + 0.1, N));
  
  float local_i = floor((j + 0.1) / N_side);
  float local_j = floor(mod(j + 0.1, N_side));
  
  float du_right =  input_du;
  float du_left  = -input_du;
  
  float du_up   = -N_side * input_du;
  float du_down =  N_side * input_du;
  
  // I don't trust (local_j == 0.0) but maybe it works.
  if (abs(local_j) < 0.25) {
    du_left = (N_side - 1.0)*input_du;
  }
  
  if (abs(local_j - N_side + 1.0) < 0.25) {
    du_right = -(N_side - 1.0)*input_du;
  }
  
  if (abs(local_i) < 0.25) {
    du_up = N_side * (N_side - 1.0) * input_du;
  }
  
  if (abs(local_i - N_side + 1.0) < 0.25) {
    du_down = -N_side * (N_side - 1.0) * input_du;
  }
  
  float input_x = uv2.x;
  float input_y = uv2.y;
  
  float R       = pixel_to_value(texture2D(vars_intermediate, vec2(input_x,            input_y)));
  float R_left  = pixel_to_value(texture2D(vars_intermediate, vec2(input_x + du_left,  input_y)));
  float R_right = pixel_to_value(texture2D(vars_intermediate, vec2(input_x + du_right, input_y)));
  float R_up    = pixel_to_value(texture2D(vars_intermediate, vec2(input_x + du_up,    input_y)));
  float R_down  = pixel_to_value(texture2D(vars_intermediate, vec2(input_x + du_down,  input_y)));
  float R_i     = pixel_to_value(texture2D(vars_old,          vec2(input_x,            input_y)));
  
  float xi_1    = pixel_to_value(texture2D(rng,               vec2(input_x,            input_y/2.0)));
  float xi_2    = pixel_to_value(texture2D(rng,               vec2(input_x + du_right, input_y/2.0)));
  float xi_3    = pixel_to_value(texture2D(rng,               vec2(input_x,            input_y/2.0 + 0.5)));
  float xi_4    = pixel_to_value(texture2D(rng,               vec2(input_x + du_down,  input_y/2.0 + 0.5)));
  
  float dR = (-R + 4.0*beta*J*tanh(R) + beta*J*(tanh(R_right) + tanh(R_left) + tanh(R_up) + tanh(R_down)))*dt + sqrt(2.0*beta*J)*(xi_1 + xi_2 + xi_3 + xi_4)*sqrt_dt;
  
  gl_FragColor = value_to_pixel(R_i + 0.5*dR);
}

varying vec2 uv2;
uniform sampler2D vars_prev;
uniform sampler2D vars_intermediate;

void main() {
  float x1 = pixel_to_value(texture2D(vars_prev,         uv2));
  float x2 = pixel_to_value(texture2D(vars_intermediate, uv2));
  
  gl_FragColor = value_to_pixel(2.0*x2 - x1);
}

/* ~~~ Initialisation of the main variables ~~~ */
var vars_canvas_obj = init_vars_texture(gl_all.vars_width, gl_all.vars_height, 0);
var vars_canvas = vars_canvas_obj.canvas_map;
var vars_canvas_alpha = vars_canvas_obj.canvas_alpha_map;

var vars_texture_init = new THREE.Texture(vars_canvas);
vars_texture_init.needsUpdate = true;
vars_texture_init.minFilter = THREE.NearestFilter;
vars_texture_init.magFilter = THREE.NearestFilter;

var vars_texture_init_alpha = new THREE.Texture(vars_canvas_alpha);
vars_texture_init_alpha.needsUpdate = true;
vars_texture_init_alpha.minFilter = THREE.NearestFilter;
vars_texture_init_alpha.magFilter = THREE.NearestFilter;

var material_vars_init = new THREE.ShaderMaterial({
  "uniforms": {
    "texture": {"type": "t", "value": vars_texture_init},
    "alpha":   {"type": "t", "value": vars_texture_init_alpha},
  },
  "vertexShader":   document.getElementById("shader_vertex_default").textContent,
  "fragmentShader": document.getElementById("shader_fragment_texture_alpha").textContent
});

quad = new THREE.Mesh(plane, material_vars_init);
gl_all.scene_vars_init = new THREE.Scene();
gl_all.scene_vars_init.add(quad);


/* ~~~ Setup of the main variables ~~~ */

// Two buffers for the main variables
gl_all.vars1_buffer = new THREE.WebGLRenderTarget(
  gl_all.vars_width,
  gl_all.vars_height,
  {"minFilter": THREE.NearestFilter, "magFilter": THREE.NearestFilter, "format": THREE.RGBAFormat});

gl_all.vars2_buffer = new THREE.WebGLRenderTarget(
  gl_all.vars_width,
  gl_all.vars_height,
  {"minFilter": THREE.NearestFilter, "magFilter": THREE.NearestFilter, "format": THREE.RGBAFormat});

// Two buffers for the variables stored temporarily during the
// DE step-forward:
gl_all.vars_intermediate1_buffer = new THREE.WebGLRenderTarget(
  gl_all.vars_width,
  gl_all.vars_height,
  {"minFilter": THREE.NearestFilter, "magFilter": THREE.NearestFilter, "format": THREE.RGBAFormat});

gl_all.vars_intermediate2_buffer = new THREE.WebGLRenderTarget(
  gl_all.vars_width,
  gl_all.vars_height,
  {"minFilter": THREE.NearestFilter, "magFilter": THREE.NearestFilter, "format": THREE.RGBAFormat});


gl_all.material_vars = new THREE.ShaderMaterial({
  "uniforms": {
    "vars_prev":         {"type": "t", "value": gl_all.vars2_buffer.texture},
    "vars_intermediate": {"type": "t", "value": gl_all.vars_intermediate2_buffer.texture}
  },
  "vertexShader":   document.getElementById("shader_vertex_default").textContent,
  "fragmentShader": shader_aux_fns + document.getElementById("shader_fragment_si_step").textContent
});

gl_all.material_copy_vars_to_intermediate = new THREE.ShaderMaterial({
  "uniforms": {"texture": {"type": "t", "value": gl_all.vars2_buffer.texture}},
  "vertexShader":   document.getElementById("shader_vertex_default").textContent,
  "fragmentShader": document.getElementById("shader_fragment_copy_texture").textContent
});

gl_all.material_vars_intermediate = new THREE.ShaderMaterial({
  "uniforms": {
    "vars_old":          {"type": "t", "value": gl_all.vars2_buffer.texture},
    "vars_intermediate": {"type": "t", "value": gl_all.vars_intermediate2_buffer.texture},
    "rng":               {"type": "t", "value": gl_all.rng_gaussian_buffer.texture},
    "dt":      {"type": "f", "value": gl_all.dt},
    "sqrt_dt": {"type": "f", "value": gl_all.sqrt_dt},
    "beta":    {"type": "f", "value": gl_all.beta},
    "J":       {"type": "f", "value": gl_all.J},  
    "debug":   {"type": "f", "value": 0.0}    
  },
  "vertexShader":   document.getElementById("shader_vertex_default").textContent,
  "fragmentShader": shader_aux_fns + document.getElementById("shader_fragment_si_iterate").textContent
});


quad = new THREE.Mesh(plane, gl_all.material_vars);
gl_all.scene_vars = new THREE.Scene();
gl_all.scene_vars.add(quad);

quad = new THREE.Mesh(plane, gl_all.material_copy_vars_to_intermediate);
gl_all.scene_copy_vars_to_intermediate = new THREE.Scene();
gl_all.scene_copy_vars_to_intermediate.add(quad);

quad = new THREE.Mesh(plane, gl_all.material_vars_intermediate);
gl_all.scene_vars_intermediate = new THREE.Scene();
gl_all.scene_vars_intermediate.add(quad);


// Initial renders:
gl_all.renderer.render(scene_rng_init,  gl_all.camera, gl_all.rng1_buffer,  true);
gl_all.renderer.render(gl_all.scene_vars_init, gl_all.camera, gl_all.vars1_buffer, true);
gl_all.renderer.render(gl_all.scene_copy_vars_to_intermediate, gl_all.camera, gl_all.vars_intermediate1_buffer, true);

swap_buffers(gl_all, "rng");
swap_buffers(gl_all, "vars");
swap_buffers(gl_all, "vars_intermediate");

// ... other code ...

/* ~~~ Main loop ~~~ */

for (var loop_ct = 0; loop_ct < steps_between_reports; loop_ct++) {
  // Generate new [0-1]-uniform random numbers and swap buffers:
  gl_all.renderer.render(gl_all.scene_rng, gl_all.camera, gl_all.rng1_buffer, true);
  swap_buffers(gl_all, "rng");
  
  // Use the newly-generated [0-1]-uniform random numbers to
  // generate standard Gaussian random numbers:
  gl_all.renderer.render(gl_all.scene_rng_gaussian, gl_all.camera, gl_all.rng_gaussian_buffer, true);
  
  // Update material_vars_intermediate, which is used in the
  // semi-implicit solver, so that it can read the new Gaussian
  // random numbers:
  gl_all.material_vars_intermediate.uniforms.rng.value = gl_all.rng_gaussian_buffer.texture;
  
  // Copy the current values of the main variables to the
  // intermediate buffer for use in iteration, and swap buffers:
  gl_all.renderer.render(gl_all.scene_copy_vars_to_intermediate, gl_all.camera, gl_all.vars_intermediate1_buffer, true);
  swap_buffers(gl_all, "vars_intermediate");
  
  // Iterate the semi-implicit solver:
  for (i = 0; i < gl_all.si_iter_steps; i++) {
    gl_all.renderer.render(gl_all.scene_vars_intermediate, gl_all.camera, gl_all.vars_intermediate1_buffer, true);
    swap_buffers(gl_all, "vars_intermediate");
  }
  
  // The final step-forward:
  gl_all.renderer.render(gl_all.scene_vars, gl_all.camera, gl_all.vars1_buffer, true);
  swap_buffers(gl_all, "vars");
  
  gl_all.t += gl_all.dt;
}

I've skipped over some minor details in the code excerpts, but since this page has a live demo, you can always view the source for the full story. The first bit of setup is right at the end of the code, before calls to init_graph() (for the on-screen display) and init_calculation(). Clicking on that 'Start calculation' pseudo-link gets main_calculation() going, which holds the main loop and does the stats calculations.

My C code (which just does a single temperature for some number of steps, without any convergence testing) is here.

Was that fun? I'm not so sure – getting it going from scratch was a frustratingly slow process, and organising every byte of every variable's value individually felt like what I imagine 1960's programming to have been like. Debugging the shaders is tedious, because the only way to know what value a variable has inside a shader is to output its value to a pixel. The fun derived from solving a physics problem for its own sake – the sort of satisfaction that made me once want to make a career of it – was balanced at times by flashes of intense frustration that recalled the reasons why I quit. But maybe it'll all be easier next time. At least I've sunk some cost into building up the skills I need to write my game.

Posted 2016-11-03,
updated 2016-11-04 (warning about heating on mobile),
updated 2016-11-05 (minor editing).

GPU physics calculations with three.js

Buffers and textures

Converting floats to/from pixels

Lagged Fibonacci Generator

10 × 10 Ising model