Interesting Results from Testing Random Number Generation Assertions

Taking Ken’s advice from my previous post, I decided to write a small test program to verify that the RNG method touted as the best really was the best.

To recap, there are three methods:
The first method involves using the modulus operator: int pickedNum = rand() % range;
The better method involves some trickery to get past the low-order bits problem: int pickedNum = rand() / (RAND_MAX / range + 1 )
And the best method involves a loop discarding the values that are not within the range:
int randDivisor = RAND_MAX / MAX_RANGE; unsigned int numPicked; do { numPicked = rand() / randDivisor; } while (numPicked >= MAX_RANGE);

I wrote up some code to seed the C++ random number generator, run through the loop for a certain sample size for each method, and then output the results. I was expecting that the best method described in the article would show a better, more even distribution. My test checked the distribution over a range of numbers. I had a vector that stored the number of times a value within that range was picked. That is, if the RNG method output a value of 3, I would increment the value in the vector associated with 3.

I also stored the start time and end time for each loop. Besides accuracy, I think speed would be important for game developers.

I ran it with different sample sizes and different seeds, and each time I seem to come to the same conclusion.

The supposedly “best” method was actually slightly worse than the “better” method.

How did I define what was good and what was bad? I already think that the method described in the article was simpler to write and understand, which means that it wins when it comes to readability and maintainability. Of course, this code will likely be written once and left in some utility class, and the other methods are fairly easy to understand, so we’ll look at other criteria.

How flat is the distribution? If an RNG picks one number much more often than the rest, then it wouldn’t be very good. We would ideally like something that was truly random, so the method that gives us the distribution that was closest to random would be best. Now, I am no expert on statistics, but my check was to subtract the maximum number of times a value was picked from the minimum number of times a value was picked. For an ideal distribution, the answer would be 0. If one distribution’s min to max range is closer to 0 than another, then we can probably assume that the method used to create that first distribution is closer to truly random.

How long does it take to compute a result? If a method produces superior results but takes multiple seconds to calculate each time, it won’t really be useful in a video game which generally tries to calculate results within milliseconds.

So those were my two criteria. If you are very familiar with statistics, perhaps you have already seen a flaw with my test. Please let me know because I would rather have the truth than be correct.

Here are the results when run over a range of 10000 numbers and a sample size of 100000000, using a seed of time(NULL) (the same value was used for each method, so the seed value was created at the start of the program):

Time to finish first method: 9 Time to finish better method: 8 Time to finish best method: 12

Diff between min and max values: First: 10398 - 9641 = 757 Better: 10414 - 9626 = 788 Best: 10416 - 9621 = 795

I ran it multiple times, and the Best method seemed to always have a large range between the minimum and the maximum number of times a value was picked. It also consistently ran 2 or 3 seconds slower than the other methods.

And using a seed of 0:

Time to finish first method: 9 Time to finish better method: 8 Time to finish best method: 11

Diff between min and max values First: 10445 - 9629 = 816 Better: 10409 - 9586 = 823 Best: 10395 - 9621 = 774

I ran the test using a much smaller range of values. Instead of 10,000, I used 10.

Time to finish first method: 9 Time to finish better method: 8 Time to finish best method: 10

Diff between min and max values First: 10006453 - 9992621 = 13832 Better: 10001752 - 9998124 = 3628 Best: 10001752 - 9998124 = 3628

Over a smaller range of values, it seems that the Best and Better methods have the same difference between the minimum and maximum number of times a value was picked. Best was still consistently slower, though.

So what do these results mean?

The trick of dividing by RAND_MAX uses the high order bits of a random number produced by rand(), thus alleviating the old problem of poor rand() implementations with non-random low order bits. But as I said, these days that problem has been largely fixed by library vendors, so the only thing you have managed to do is make the code more complicated. It buys you nothing, and before you start thinking that doing the same thing with floating-point will help, I assure you that it will not.

Well, it seems that according to my tests it buys time, but then the article does mention using a uniform deviate from the random number. I then added that version to my test, and recomputed the results.

For a small range, I found that it ran at the same speed as the first two methods while also providing the same difference between minimum and maximum number of times a value is picked as the Better or Best method.

For a very large range (100000 values), however, I got the following results:
Time to finish first method: 11 Time to finish better method: 11 Time to finish best method: 13 Time to finish deviate method: 11

Diff between min and max values First: 1178 - 860 = 318 Better: 1140 - 238 = 902 Best: 1135 - 861 = 274 Deviate: 1142 - 865 = 277

As you can see, the uniform deviate method was computed at about the same speed as the non-looping methods. While being only slightly worse at flattening the distribution as compared to the Best method, it blew away the Better method.

However, I also would like to note that the original method, which was supposedly so bad, seems to have a fairly flat distribution, too. Of course, once you get to a smaller range of values, it got much, much worse. If you only checked a range of 10 values, the first method came back with over 3 times the difference of the other methods, and the deviate method worked about as well as the Best, coming back with the same flatness in distribution.

Apparently which method you use depends on what range of values you are checking. If you are checking a small range of values, then you would do well NOT to use the First method. If you are checking a large range of values, it doesn’t seem to matter as much. In any case, the “Best” method is not named right, but the uniform deviate does correct its speed problem.

Potential Problems with These Tests:
Subtracting the minimum number of times a value is picked from the minimum number of times a value is picked might not really give me a meaningful value like I think it does. If you can think of a better way to handle this test, please feel free to correct me.

If you would like the C++ source, you can download it:
randomtest.tar.gz
randomtest.zip