Foundations of Random Number Generation in JavaScript

Tim Brock / Thursday, July 14, 2016

Being able to generate (apparently) random numbers is an essential requirement for many areas of math, statistics, the sciences, technology and gaming. For instance, they can be used to assign participants to a group in a randomized controlled trial, to determine the output of a computer game or to find approximate solutions to otherwise intractable problems. I frequently use random numbers to check my solutions to perfectly tractable statistical problems too. Cryptographic random number generators, which we won't cover here, can be used in security.

Types of Random Number Generator

For the purpose of this article at least, we can think of there being three categories of random-number collection:

"True" random numbers;
Pseudorandom numbers;
Quasirandom sequences.

True (or hardware) random number generators (TRNGs) use real-world physical processes believed to be random in nature to create streams of numbers. Decay events from a radioactive source, for example, are random and uncorrelated with each other, atmospheric noise can also be used.

TRNGs often aren't practical or convenient (or necessary) for many purposes. A far more common alternative is to create a stream of numbers that appear to be randomly distributed over some interval using a computer algorithm. These algorithms are not truly unpredictable and so are referred to as pseudorandom number generators (PRNGs).

Finally, quasirandom sequences are a finite collection of numbers that are meant to be representative of a sample space in some way. For example, the mean of the sequence may be the same (or very similar to) the known mean of the population. While quasirandom sequences are interesting and can be useful, they're not the focus of this article and won't be discussed further.

The Standard Uniform Distribution

Generally, the output of a large number of values from a pseudorandom number generator is meant to approximate the standard uniform distribution (SUD) that covers the range 0 to 1. There is, however, some variation in whether either/both of the end points are included. In the conceptual world of the mathematics of continuous distributions there is basically no difference between the uniform distribution between [0,1] (includes 0 and 1) and the uniform distribution between (0,1) (excludes 0 and 1). In the real world of floating point numbers, with only a finite number of possible values between 0 and 1, the difference is real and could potentially be problematic. One might, for example, want to use the generated number inside the natural logarithm function Math.log.

Generating a random number for another distribution is usually "just" a matter of using one or more numbers from an SUD PRNG to produce an appropriate value from the distribution of interest. Depending on the desired final distribution, this might involve one line of code or something quite complex. For the rest of this article I will stick to discussing the generation of numbers from a SUD PRNG.

Period

A useful PRNG must have a large period, which is to say that it must be able to output a large number of numbers without repeating itself. For example, the Wichmann Hill PRNG of 1982 (more on this later) has a period of nearly 7 trillion numbers, while the exceedingly popular Mersenne Twister PRNG has a period of 2¹⁹⁹³⁷ − 1 numbers. The former is considered quite short by modern standards.

What's wrong with Math.random?

You may use JavaScript's built-in Math.random regularly and have no problems with it. It does, however, have one big limitation: its output is not reproducible. Run code that utilizes Math.random again and again and you'll get a different set of results each time. In many cases that really isn't a problem. In fact in many (most, probably) cases it'll be exactly what you want. Unpredictability (to a human sat staring at the output at least) is exactly what is required. But sometimes we do want to see the same set of results.

Moving away from JavaScript for a moment, consider running a simulation experiment for a piece of work you wish to publish. Perhaps you're using C++ or Java or R or... You want your results to be reproducible. These languages (and many others besides) offer a way of "seeding" the initial state of their PRNGs. You set the seed in some way or other and you get the same sequence of "random" numbers out. Math.random requires a seed too, there's just no way of setting it yourself. The specification for Math.random is also fairly open-ended, allowing browser vendors to use "an implementation-dependent algorithm" so long as the output is approximately uniform over the range [0,1).

From a personal perspective, I'm a big fan of browser-based interactive data visualization for communicating both data and concepts. This could include simulation; there are limits to what is practical in a browser but web workers can help. Simulation frequently requires random numbers. If the random numbers are not reproducible then the conditions for a simulation can't be re-run. There are plenty of other use cases too, such as a repeatable animation, and JavaScript isn't just the programming language of the browser any more.

Problematic PRNGs

Up until now I've skipped over how uniform-distribution PRNGs work. It's all been something of a black box: you call a function one or more times, possibly setting a seed, and then some pseudorandom numbers come out. The problem is that creating a good PRNG is difficult. There are thousands of papers on the topic and multiple methods. And multiple instances where people who know much more about this topic than I appear to have got things wrong. For instance...

RANDU

RANDU is a "linear congruential generator" (LCG) developed by IBM in the 1950's. LCG's use a recurrence relation of the following form to generate new pseudorandom numbers:

In the case of RANDU, c is 0, a is 65,539 and m is 2³¹. Because c is 0, RANDU is a member of a subset of LCG's called "multiplicative congruential generators" (MCG). To get a number in the range 0 to 1 (as is desired for a replacement for Math.random), one just needs to divide the result of the RANDU recurrence relation by m. A JavaScript implementation (which you definitely shouldn't use!) could look something like this:

var randu = function(seed){

   "use strict";
   
   if(!isFinite(seed)){
      throw new Error("Seed not a finite number");
   }
		   
   var x = Math.round(seed);
   var a = 65539;
   var m = Math.pow(2, 31);

   if(x<1 || x≥m){
      throw new Error("Seed out of bounds");
   }

   return function(){
      x = (a*x)%m;
      return x/m;
   };
    
};

The parity of the value generated by the recurrence relation in RANDU never changes. That is to say, an odd seed gives rise only to odd values of x while an even seed gives rise only to even values of x. This isn't exactly a desirable property, but there are bigger problems.

The period of an LCG is at most m, but for RANDU it is much less than that and depends on the parity of the seed. For odd seeds it's over 536 million but for even seeds it can be as little as 16,384.

There's another reason not to bother with an even seed: One common, simple method for assessing the randomness of a generator is to plot pairs of successive values as a scatterplot. A good PRNG should fill a 1-by-1 square fairly evenly. With 10,000 random numbers, 5,000 points, and a seed of 1 everything looks reasonable. (You can think of "x" as referring to the values in even-index positions in an (0-indexed) array of 10,000 random numbers. That is indices 0, 2, 4, 6... 9,996, 9,998. Points on the scatter plot are made by matching up with the next odd-index (1, 3, 5, 7... 9,997, 9999) "y" value.)

With some even seeds we something quite different. Below is a scatterplot for the seed 32,768.

Clearly we don't just have a deficit of points in the case of even seeds. In some cases, we have unambiguous relationships between neighboring values.

It would be simple enough to adapt the randu function above to reject even seeds, giving an appropriate error message. Unfortunately, odds seeds show structure too. To see this we just need to extend the scatterplot idea to three dimensions by looking at triplets of successive random numbers. 3D scatter plots are frequently pretty useless. RANDU data provides something of an exception to this rule. (Here the indices for "x" values are 0, 3, 6, 9... 9,993, 9,996, the indices for "y" values are 1, 4, 7, 10... 9,994, 9997 and the indices for "z" values are 2, 5, 8, 11... 9,995, 9,998.)

Rather than fill the box roughly evenly, triplets of numbers all lie in one of 15 planes, regardless of the seed! (Actually, for the even seed 32,768 it's worse than this.) We can compare this with results from Math.random (I used Chrome for this); the difference is stark.

One general problem with 3D scatterplots is that the visibility of structure in the plot can depend on the viewing angle. From certain angles the structure in RANDU plot is hidden. This could be the case for Math.random. To help with this issue I created an interactive version that lets you choose a PRNG (and, where appropriate, one or more seeds) and visualizes the results using WebGL. This demo can be found here and the box of random numbers can be rotated (using the mouse) and viewed from different angles. I've yet to find any obvious signs of structure when using Math.random across a number of browsers (Chrome, Firefox, Maxthon, IE, Edge and Opera on the desktop and Safari on iOS).

The appearance of lattice structures in 3 or more dimensions exists for all MCG's, but it is particularly bad for RANDU. It has been known about since the 1960's. Despite this, RANDU was still used in the 1970's and some simulation results from that era should, perhaps, be viewed with skepticism.

Excel

You may be wondering whether problems with PRNG's were confined to the 1960's and '70's? The short answer to this question is "no". After criticism of it's old random number generator, Microsoft changed to the Wichmann–Hill PRNG for Excel 2003. The WH generator (first published in 1982) combines three MCG's to overcome some of the shortcomings of single MCGs (such as a relatively short period and the lattice planes and hyperplanes seen when looking at groups of neighboring values). A quick JavaScript implementation of WH could look like the following:

var wh = function(seeds){
  
   "use strict";

   if(seeds.length<3){
      throw new Error("Not enough seeds");
   }

   var xA = Math.round(seeds[0]);
   var aA = 171;
   var mA = 30269;

   var xB = Math.round(seeds[1]);
   var aB = 172;
   var mB = 30307;

   var xC = Math.round(seeds[2]);
   var aC = 170;
   var mC = 30323;
   
   if(!isFinite(xA) || !isFinite(xB) || !isFinite(xC)){
      throw new Error("Seed not a finite number");
   }

   if(Math.min(xA,xB,xC)<1 || xA≥mA || xB≥mB || xC≥mC){
      throw new Error("Seed out of bounds");
   }  

   return function(){
      xA = (aA*xA)%mA;
      xB = (aB*xB)%mB;
      xC = (aC*xC)%mC;
      return (xA/mA + xB/mB + xC/mC) % 1;
   };
	  
};

Again we can look for structure when plotting sets of neighboring values in two or three dimensions:

While this clearly isn't sufficient to say whether or not we have a good random number generator, the plots above at least look reasonable. (You can also check the three-dimensional box in the WebGL demo mentioned above.) However, the original implementation of the WH algorithm for the RAND function in Excel occasionally spat out negative numbers!

The WH algorithm fails a number of more modern and stringent tests of PRNGs and it seems Microsoft has moved on to using the popular (but rather more complex) Mersenne Twister algorithm.

V8

Earlier I used output from Chrome's implementation of Math.random to illustrate what the distribution of triplets of random numbers should look like when plotted as a three-dimensional scatterplot. However, the algorithm that was used by V8, Chrome's JavaScript engine, was recently shown to be flawed. Specifically, it reproduced the same "random" character strings with far higher frequency than it should have, as this long but informative article describes. V8 developers promptly changed the algorithm used and the issue seems to have been fixed in Chrome since version 49.

Speed

Another consideration before you try to "grow your own" JavaScript PRNG is speed. One should expect Math.random to be highly optimized. For example, some very rough tests using several browsers showed Math.random to be ~3 times quicker at producing 100,000 random numbers than the simple WH implementation above. Having said this, we're still talking of the order of just tens of milliseconds (at least in Chrome and Firefox on my laptop) so this may not be a bottleneck, even if you do require a large number of random numbers. Of course, more complex PRNGs with better statistical properties may be slower.

Conclusions

I've barely touched the surface here. I've only looked at a couple of simple PRNGs and showed that they can still be problematic. There are far more complicated algorithms for PRNGs out there, some that have passed a large number of quite stringent statistical tests. And some of these already have open source JavaScript implementations.

If you don't need reproducibility, Math.random may be just fine for all your random number needs. Either way, if reliability of your random number generator is critical to how your site functions then you should perform relevant checks. In the case of Math.random this means checking in all target browsers since the JavaScript specification does not specify a particular algorithm that vendors must use.

Try our JavaScript HTML5 controls for your web apps and take immediate advantage of their stunning data visualization capabilities. Download Free Trial today.