Probably, it is buried somewhere but I could not find any ASM floating point pseudo-random number generator (PRNG). Sure I found many ASM integer PRNG :t
Actually, it is very easy to produce a floating point PRNG from an integer one, using the knowledge we have (or might have) about IEEE 754.
This is the main purpose of this essay, but I will use the opportunity to present the results for REAL8, REAL4 and HALF with 15, 6 and 3 significant digits (YES, printf does have a little know capacity for that).
And this time I will have no mercy for people with too old computers (older than Ivy Bridge), sorry. I will use the rdrand for random numbers, and the vcvtps2ph and vcvtph2ps for the HALFs in order to make the code shorter.
REAL8 Value (15 significant digits)=-9650.29779824708
REAL4 Value (6 significant digits)=-9650.3
HALF Value (3 significant digits)=-9.65e+003
REAL8 Value (15 significant digits)=-40373.2896508689
REAL4 Value (6 significant digits)=-40373.3
HALF Value (3 significant digits)=-4.04e+004
REAL8 Value (15 significant digits)=-22474.0836345981
REAL4 Value (6 significant digits)=-22474.1
HALF Value (3 significant digits)=-2.25e+004
REAL8 Value (15 significant digits)=43495.2071411472
REAL4 Value (6 significant digits)=43495.2
HALF Value (3 significant digits)=4.35e+004
REAL8 Value (15 significant digits)=42173.4249726214
REAL4 Value (6 significant digits)=42173.4
HALF Value (3 significant digits)=4.22e+004
REAL8 Value (15 significant digits)=10021.6594674816
REAL4 Value (6 significant digits)=10021.7
HALF Value (3 significant digits)=1e+004
REAL8 Value (15 significant digits)=14000.1614663608
REAL4 Value (6 significant digits)=14000.2
HALF Value (3 significant digits)=1.4e+004
REAL8 Value (15 significant digits)=7151.36409892466
REAL4 Value (6 significant digits)=7151.36
HALF Value (3 significant digits)=7.15e+003
REAL8 Value (15 significant digits)=16185.3460413327
REAL4 Value (6 significant digits)=16185.3
HALF Value (3 significant digits)=1.62e+004
REAL8 Value (15 significant digits)=-58318.6575650965
REAL4 Value (6 significant digits)=-58318.7
HALF Value (3 significant digits)=-5.83e+004
REAL8 Value (15 significant digits)=21377.3863558798
REAL4 Value (6 significant digits)=21377.4
HALF Value (3 significant digits)=2.14e+004
REAL8 Value (15 significant digits)=62768.6210416745
REAL4 Value (6 significant digits)=62768.6
HALF Value (3 significant digits)=6.28e+004
REAL8 Value (15 significant digits)=-7787.06495612929
REAL4 Value (6 significant digits)=-7787.06
HALF Value (3 significant digits)=-7.79e+003
REAL8 Value (15 significant digits)=42013.0045911184
REAL4 Value (6 significant digits)=42013
HALF Value (3 significant digits)=4.2e+004
REAL8 Value (15 significant digits)=-61322.6968192349
REAL4 Value (6 significant digits)=-61322.7
HALF Value (3 significant digits)=-6.13e+004
REAL8 Value (15 significant digits)=27707.4044548888
REAL4 Value (6 significant digits)=27707.4
HALF Value (3 significant digits)=2.77e+004
REAL8 Value (15 significant digits)=-4169.53884418745
REAL4 Value (6 significant digits)=-4169.54
HALF Value (3 significant digits)=-4.17e+003
REAL8 Value (15 significant digits)=21556.1687919651
REAL4 Value (6 significant digits)=21556.2
HALF Value (3 significant digits)=2.16e+004
REAL8 Value (15 significant digits)=54650.2552792932
REAL4 Value (6 significant digits)=54650.3
HALF Value (3 significant digits)=5.47e+004
REAL8 Value (15 significant digits)=29722.4088189263
REAL4 Value (6 significant digits)=29722.4
HALF Value (3 significant digits)=2.97e+004
ObjAsm32 RNG :t
Quote from: HSE on September 16, 2018, 01:12:03 AM
ObjAsm32 RNG :t
:biggrin:
I stick to this old definition of Assembly Language (https://www.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/com.ibm.zos.v2r1.asma400/asmr102112.htm)
Exactly!!
Later:
Just in case try 32 bit version.
I think 64 bit version only erase some macros implemented like HLL in JWASM derivatives, but I'm not sure.
I write well in previous post: 32 bit
I use these 2 fast and small hacks to calculate random real4 values.
A: between 0.0 and 1.0
B: between -1.0 and 1.0
.const
Fl32_1 real4 1.0
Fl32_3 real4 3.0
.data
align 4
Seed dd 476954562 ; initialize once with a seed value, can be anything except 0
MagicRnd dd 16807
Scale real4 255.0
.code
;A
mov eax,Seed
mul MagicRnd
mov Seed,eax
shr eax,9
or eax,03f800000h
mov dword ptr [esp-4],eax
movss xmm0,dword ptr [esp-4]
subss xmm0,Fl32_1 ; result = a random real4 between 0.0 and 1.0
; mulss xmm0,Scale
;B
mov eax,Seed
mul MagicRnd
mov Seed,eax
shr eax,9
or eax,040000000h
mov dword ptr [esp-4],eax
movss xmm0,dword ptr [esp-4]
subss xmm0,Fl32_3 ; result = a random real4 between -1.0 and 1.0
; mulss xmm0,Scale
New fully commented code, maximum 4 random real4 values at once.
Still have to check if the distribution is satisfactory....
.const
Shuffle MACRO V0,V1,V2,V3
EXITM %((V0 shl 6) or (V1 shl 4) or (V2 shl 2) or (V3))
ENDM
align 16
Range12 dd 03f800000h,03f800000h,03f800000h,03f800000h
Range24 dd 040000000h,040000000h,040000000h,040000000h
MagicRnd dd 16807,16807,16807,16807
Fl32_1 real4 1.0,1.0,1.0,1.0
Fl32_3 real4 3.0,3.0,3.0,3.0
.data
align 16
Seed dd 476954562,473954562,471954562,479954562 ; initialize once with seed values
Scale real4 255.0,255.0,255.0,255.0
.code
A:
movdqa xmm0,oword ptr Seed ; Get the 4 seeds
movdqa xmm2,oword ptr MagicRnd ; Get the 4 MagicRnds
pshufd xmm1,xmm0,Shuffle(0,1,2,3) ; shuffle the second multiplication in place
pmuludq xmm0,xmm2 ; Save the first pair qword multiply results
pmuludq xmm1,xmm2 ; Save the second pair qword multiply results
shufps xmm0,xmm1,Shuffle(2,0,2,0) ; Save the low 32 bit parts from the 4 qwords
movdqa oword ptr Seed,xmm0 ; Save the new 4 Seeds for the next run
psrld xmm0,9 ; Shift them to the fractional parts
orps xmm0,oword ptr Range12 ; Generate 4 real4 random numbers between 1.0 and 2.0
subps xmm0,oword ptr Fl32_1 ; Set the ranges of the 4 values between 0.0 and 1.0
; mulps xmm0,oword ptr Scale ; if you want to scale the values up to a range of your choice
ret
B:
movdqa xmm0,oword ptr Seed ; Get the 4 seeds
movdqa xmm2,oword ptr MagicRnd ; Get the 4 MagicRnds
pshufd xmm1,xmm0,Shuffle(0,1,2,3) ; shuffle the second multiplication in place
pmuludq xmm0,xmm2 ; Save the first pair qword multiply results
pmuludq xmm1,xmm2 ; Save the second pair qword multiply results
shufps xmm0,xmm1,Shuffle(2,0,2,0) ; Save the low 32 bit parts from the 4 qwords
movdqa oword ptr Seed,xmm0 ; Save the new 4 Seeds for the next run
psrld xmm0,9 ; Shift them to the fractional parts
orps xmm0,oword ptr Range24 ; Generate 4 real4 random numbers between 2.0 and 4.0
subps xmm0,oword ptr Fl32_3 ; Set the ranges of the 4 values between -1.0 and 1.0
; mulps xmm0,oword ptr Scale ; if you want to scale the values up to a range of your choice
ret
EDIT: 1 pmuludq is not enough to get 4 random values.
We need 2 pmuludq and save the 4 low 32 bits of the 4 qwords.
REPLACED THE OLD CODE WITH THIS NEW CODE !!!!! :icon_redface:
Good job :t
Quote from: Siekmanski on September 16, 2018, 05:30:48 PM
Still have to check if the distribution is satisfactory....
May be not, it is a Lehmer random number generator :(. At least not as good as rdrand, which is certified by NSA :badgrin:
It was not reliable enough because of the behaviour of "pmuludq".
I had to split it up in 2 parts to get the full 32 bit range for the 4 seeds. ( didn't noticed it before.... )
See Reply #5 for the new code.
This will be enough for audio and graphics programming and its fast. :biggrin:
Quote from: Siekmanski on September 16, 2018, 11:11:29 PM
It was not reliable enough because of the behaviour of "pmuludq"
Yes, it's pretty confusing ;-)
Multiplies the first operand (destination operand) by the second operand (source operand) and stores the result in the destination operand. The source operand can be a unsigned doubleword integer stored in the low doubleword of an MMX� technology register or a 64-bit memory location, or it can be two packed unsigned doubleword integers stored in the first (low) and third doublewords of an XMM register or an 128-bit memory location. The destination operand can be a unsigned doubleword integer stored in the low doubleword an MMX register or two packed doubleword integers stored in the first and third doublewords of an XMM register. When packed doubleword operands are used, a SIMD multiply is performed on two sets of values, producing two results. When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).
Quote from: jj2007 on September 16, 2018, 11:34:55 PM
Yes, it's pretty confusing ;-)
You need a drawing. Try to search with google and click the images tab. :idea:
Quote from: AW on September 17, 2018, 02:13:41 AMYou need a drawing
My red highlighting above is enough. It's pretty simple once you've understood it.
Hi Siekmanski.
Thank you very much for your random for 4 x real4 values.
This is awesome for game programming, just like your 15 at once timers from here:
http://masm32.com/board/index.php?topic=7060.48 (MultimediaTimers.zip)
Using jj2007's formula from here
http://masm32.com/board/index.php?topic=7419. (Reply #2)
to fill the seed
AutoSetRandom proc
local SaveEdi:dword
mov SaveEdi, edi ; Object pointer
mov edi, offset Seed
xor esi, esi
.Repeat
invoke Sleep, 1 ; leave the time slice
cpuid ; serialise
rdtsc
mov byte ptr[edi+esi], al
inc esi
.Until( esi > 15 )
mov edi, SaveEdi
ret
AutoSetRandom endp
Thank you jj2007 also for this code.
For Crypto RNG, rdrand.. yep.. but it's slow as heck.
For PRNG, the best option at the moment is XoroShiro128+
It's distribution and statistical characteristics are excellent, performance is fantastic, sub nanosecond per result.
I've been using this approach with much success over standard PRNG algos both in terms of quality and performance.
Another alternative, depending on your application is to look at using low discrepancy sequences like Halton which in some cases can provide much better results than a PRNG (such as convergence rates etc).
Another fast one is the PCG Family: http://www.pcg-random.org/
Quote from: johnsa on September 28, 2018, 06:31:10 PMFor PRNG, the best option at the moment is XoroShiro128+
As Marinus mentioned already, there is PCG32. They claim it's better than XoroShiro128, as proven with PractRand. It's a long and controversial issue, though.
I can't find any claim that it's better than xoroshiro128+, only xorshift or xoroshiro64.. from some other links:
https://nullprogram.com/blog/2017/09/21/
http://xoshiro.di.unimi.it/#shootout
http://www.pcg-random.org/posts/birthday-test.html
I'll stick to it for now anyway either way.
Quote
For Crypto RNG, rdrand.. yep.. but it's slow as heck
Slow but unpredictable.
Not everybody wants fast food, sometimes it is better to wait a couple of nanoseconds more to eat something based on entropy.
Indeed, for the right application it's ideal.. I was planning on using it as a source of entropy/randomness to work on my telekinesis skills.. :) with a nice gui to boot.. haha
Quote from: johnsa on September 29, 2018, 01:51:09 AM
I can't find any claim that it's better than xoroshiro128+, only xorshift or xoroshiro64
Google for
"O'Neill" "Vigna" to see an interesting controversy.
P.S., your link to null program:
QuoteAugust 2018 Update: xoroshiro128+ fails PractRand very badly.
http://www.pcg-random.org/posts/a-quick-look-at-xoshiro256.html is also a good read 8)
Ahh good spot and some interesting reading.
I'll give the 256** version a go and see how it fares.