Most PRNGs produce a range that is higher than the desired one. For example, the CRT rand function returns a pseudorandom integer in the range 0 to RAND_MAX (32767) (https://msdn.microsoft.com/en-us/library/398ax69y.aspx), which is a bit poor for my taste, but that's not the topic here.
I've seen frequently in C/C++ forums that people use the modulo operator % to obtain the desired range:
One common way of choosing a random number in [0, n) is to take the result of rand() modulo n: rand() % n (//http://)
Since I am nil in mathematics, I chose an apparently simpler way to limit the range: mov mydword, Rand(1000) would perform Rand(-1)*1000/(2^32-1) (more (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1030)). Under the hood, Alex' AxRand is working, and it produces a 32 bit number. Note this is not to sell you Rand() or MasmBasic, I simply want to understand why C/C++ coders jump on the modulo method. I have written a little testbed counting how often integers in the range 0 ... n (n=20 Mio) are being selected with both methods. Through my eyes, the multiply method looks more convincing. What do the mathematicians among you think?
#element #matches with MULTIPLY method:
321619 9
2094528 9
2601443 9
4300802 9
4731850 10
4875710 9
6610812 9
7040809 9
7177863 9
7211096 9
7293029 10
8006909 9
8270313 9
10799412 9
11435230 9
12687793 9
14475367 10
14628963 10
15364717 9
16691551 9
16817340 9
17900582 9
18319853 9
36.700% elements are zero, 36.878% elements appear once
The testbeds list only items that have been selected 9 and more times.
#element #matches with MODULO method:
1 23
3 20
4 20
12 20
25 20
37 21
62 22
63 20
70 20
71 22
80 25
85 20
119 21
223 21
399 20
634 20
1021 20
1175 20
3279 20
4238 20
14430 21
23255 20
50.003% elements are zero, 24.999% elements appear once
Many more outliers, and exactly half are never selected, exactly one quarter are selected once. Note also that all outliers appear in the first 0.1% of the sample. So is there a reason to prefer modulo that I just don't see as a layman?
My 2 cent's worth:
From pure math point of view the two are equivalent, as common sense says. You have a bunch of random (uniform random) numbers, output of an r.v. X, want them to represent a smaller number, n, so put them equally into n bins. In multiply case the bins are contiguous, from X*(i-1)/n to X*i/n, in the other they're spaced throughout, including all numbers where X mod n = i. (If I understand you correctly). You have to take proper care, of course, to be sure bins contain equal samples. In mult case make sure rounding off is correct, in mod case throw away any tail which would increase only the lower bins. Any method at all which creates equal bins, and throws away minimal amount of numbers, should work just as well.
However that's only "pure" math: assuming that uniform dist is really perfect. In real life it won't be so the choice of bins can make a difference. For instance if the dist is incorrectly weighted more to the bottom, or top, the mod method should eliminate the bias but mult won't. Harder to imagine an opposite case, where mod retains the bias but not mult - maybe that's why some programmers favor it - but the question gets arbitrarily complicated once u admit there may be some sort of bias.
What your tests are doing is, examining how well your RNG performs! Seems it's not perfect given the uneven results. Need to examine how many 0's expected, dist of outliers expected. It's do-able of course; could be the mod method is actually giving "better" results, except for one thing, the bias towards low numbers. That can't be right.
Then there are the computer programming considerations ... for a C programmer isn't "mod" simply more convenient? But not in assembler.
Mult seems to work better for you, in practice; could be it very nicely compensates for your RNG bias; and theoretically it's fine so I wouldn't worry. Still those very different outcomes could stand some investigation from one who needs a dreary and painstaking task to while away the hours.
Hi,
In general, the modulo will select using the lower bits in the random
number and multiply uses the upper bits. (I think.) Most LCG are
most random in the high bits, and not very random in the low bits.
For a true linear (uniform) distribution they should give similar
results.
I think you should modify your code for the modulo test to use
Rand(-1) and then divide by num. Or maybe I am not reading your
code correctly?
And dividing EDX:EAX by 2^32-1 moves the contents of EDX to EAX.
so you can save the divide and just move EDX to EAX.
Regards,
Steve N.
Edit:
Sorry, the comment about dividing by 2^32-1 only applies if EAX
is zeroed. EDX will have something different if EAX is non-zero using
the divide. Wasn't thinking.
SRN
Thanks, folks. It turns out there was a nasty error in the routine I posted. Here is a better version comparing the Windows RtlRandomEx algo and Rand():
#element #matches with MULTIPLY method, using RtlRandomEx
56.793% elements are zero, 13.511% elements appear once
#element #matches with MODULO method:
36.804% elements are zero, 36.766% elements appear once
#element #matches with MULTIPLY method, using MasmBasic Rand()
36.765% elements are zero, 36.826% elements appear once
#element #matches with MODULO method:
36.794% elements are zero, 36.765% elements appear once
It turns out that they produce very similar results for the mod method, but only MasmBasic yields the same patterns for the multiply method, too. Very odd...
EDIT: Version 3 added. All it does is a bswap for the RtlRandomEx result, and voilĂ , all results resemble each other:Generating took 482 ms
#element #matches with MULTIPLY method, using RtlRandomEx
36.771% elements are zero, 36.827% elements appear once, 1.9043% often
#element #matches with MODULO method:
36.749% elements are zero, 36.843% elements appear once, 1.8968% often
Generating took 104 ms
#element #matches with MULTIPLY method, using MasmBasic Rand()
36.773% elements are zero, 36.797% elements appear once, 1.8869% often
#element #matches with MODULO method:
36.804% elements are zero, 36.768% elements appear once, 1.9033% often