Rand(0...1000): modulo or multiply method?

jj2007 · April 08, 2015, 05:57:00 AM

Most PRNGs produce a range that is higher than the desired one. For example, the CRT rand function returns a pseudorandom integer in the range 0 to RAND_MAX (32767), which is a bit poor for my taste, but that's not the topic here.

I've seen frequently in C/C++ forums that people use the modulo operator % to obtain the desired range:

One common way of choosing a random number in [0, n) is to take the result of rand() modulo n: rand() % n

Since I am nil in mathematics, I chose an apparently simpler way to limit the range: mov mydword, Rand(1000) would perform Rand(-1)*1000/(2^32-1) (more). Under the hood, Alex' AxRand is working, and it produces a 32 bit number. Note this is not to sell you Rand() or MasmBasic, I simply want to understand why C/C++ coders jump on the modulo method. I have written a little testbed counting how often integers in the range 0 ... n (n=20 Mio) are being selected with both methods. Through my eyes, the multiply method looks more convincing. What do the mathematicians among you think?

Code Select

#element        #matches with MULTIPLY method:
  321619        9
 2094528        9
 2601443        9
 4300802        9
 4731850        10
 4875710        9
 6610812        9
 7040809        9
 7177863        9
 7211096        9
 7293029        10
 8006909        9
 8270313        9
10799412        9
11435230        9
12687793        9
14475367        10
14628963        10
15364717        9
16691551        9
16817340        9
17900582        9
18319853        9
36.700% elements are zero, 36.878% elements appear once

The testbeds list only items that have been selected 9 and more times.

Code Select

#element        #matches with MODULO method:
       1        23
       3        20
       4        20
      12        20
      25        20
      37        21
      62        22
      63        20
      70        20
      71        22
      80        25
      85        20
     119        21
     223        21
     399        20
     634        20
    1021        20
    1175        20
    3279        20
    4238        20
   14430        21
   23255        20
50.003% elements are zero, 24.999% elements appear once

Many more outliers, and exactly half are never selected, exactly one quarter are selected once. Note also that all outliers appear in the first 0.1% of the sample. So is there a reason to prefer modulo that I just don't see as a layman?

rrr314159 · April 08, 2015, 07:00:21 AM

My 2 cent's worth:

From pure math point of view the two are equivalent, as common sense says. You have a bunch of random (uniform random) numbers, output of an r.v. X, want them to represent a smaller number, n, so put them equally into n bins. In multiply case the bins are contiguous, from X*(i-1)/n to X*i/n, in the other they're spaced throughout, including all numbers where X mod n = i. (If I understand you correctly). You have to take proper care, of course, to be sure bins contain equal samples. In mult case make sure rounding off is correct, in mod case throw away any tail which would increase only the lower bins. Any method at all which creates equal bins, and throws away minimal amount of numbers, should work just as well.

However that's only "pure" math: assuming that uniform dist is really perfect. In real life it won't be so the choice of bins can make a difference. For instance if the dist is incorrectly weighted more to the bottom, or top, the mod method should eliminate the bias but mult won't. Harder to imagine an opposite case, where mod retains the bias but not mult - maybe that's why some programmers favor it - but the question gets arbitrarily complicated once u admit there may be some sort of bias.

What your tests are doing is, examining how well your RNG performs! Seems it's not perfect given the uneven results. Need to examine how many 0's expected, dist of outliers expected. It's do-able of course; could be the mod method is actually giving "better" results, except for one thing, the bias towards low numbers. That can't be right.

Then there are the computer programming considerations ... for a C programmer isn't "mod" simply more convenient? But not in assembler.

Mult seems to work better for you, in practice; could be it very nicely compensates for your RNG bias; and theoretically it's fine so I wouldn't worry. Still those very different outcomes could stand some investigation from one who needs a dreary and painstaking task to while away the hours.

FORTRANS · April 08, 2015, 07:16:06 AM

Hi,

In general, the modulo will select using the lower bits in the random
number and multiply uses the upper bits. (I think.) Most LCG are
most random in the high bits, and not very random in the low bits.
For a true linear (uniform) distribution they should give similar
results.

I think you should modify your code for the modulo test to use
Rand(-1) and then divide by num. Or maybe I am not reading your
code correctly?

And dividing EDX:EAX by 2^32-1 moves the contents of EDX to EAX.
so you can save the divide and just move EDX to EAX.

Regards,

Steve N.

Edit:

Sorry, the comment about dividing by 2^32-1 only applies if EAX
is zeroed. EDX will have something different if EAX is non-zero using
the divide. Wasn't thinking.

SRN

jj2007 · April 08, 2015, 08:55:12 AM

Thanks, folks. It turns out there was a nasty error in the routine I posted. Here is a better version comparing the Windows RtlRandomEx algo and Rand():

Code Select

#element        #matches with MULTIPLY method, using RtlRandomEx
56.793% elements are zero, 13.511% elements appear once

#element        #matches with MODULO method:
36.804% elements are zero, 36.766% elements appear once


#element        #matches with MULTIPLY method, using MasmBasic Rand()
36.765% elements are zero, 36.826% elements appear once

#element        #matches with MODULO method:
36.794% elements are zero, 36.765% elements appear once

It turns out that they produce very similar results for the mod method, but only MasmBasic yields the same patterns for the multiply method, too. Very odd...

EDIT: Version 3 added. All it does is a bswap for the RtlRandomEx result, and voilà, all results resemble each other:

Code Select

Generating took 482 ms
#element        #matches with MULTIPLY method, using RtlRandomEx
36.771% elements are zero, 36.827% elements appear once, 1.9043% often

#element        #matches with MODULO method:
36.749% elements are zero, 36.843% elements appear once, 1.8968% often

Generating took 104 ms
#element        #matches with MULTIPLY method, using MasmBasic Rand()
36.773% elements are zero, 36.797% elements appear once, 1.8869% often

#element        #matches with MODULO method:
36.804% elements are zero, 36.768% elements appear once, 1.9033% often

The MASM Forum

News:

Rand(0...1000): modulo or multiply method?

jj2007

rrr314159

FORTRANS

jj2007