Floating point PRNG

aw27 · September 16, 2018, 01:01:45 AM

Probably, it is buried somewhere but I could not find any ASM floating point pseudo-random number generator (PRNG). Sure I found many ASM integer PRNG :t
Actually, it is very easy to produce a floating point PRNG from an integer one, using the knowledge we have (or might have) about IEEE 754.
This is the main purpose of this essay, but I will use the opportunity to present the results for REAL8, REAL4 and HALF with 15, 6 and 3 significant digits (YES, printf does have a little know capacity for that).

And this time I will have no mercy for people with too old computers (older than Ivy Bridge), sorry. I will use the rdrand for random numbers, and the vcvtps2ph and vcvtph2ps for the HALFs in order to make the code shorter.

Code Select


REAL8 Value (15 significant digits)=-9650.29779824708
REAL4 Value (6 significant digits)=-9650.3
HALF Value (3 significant digits)=-9.65e+003


REAL8 Value (15 significant digits)=-40373.2896508689
REAL4 Value (6 significant digits)=-40373.3
HALF Value (3 significant digits)=-4.04e+004


REAL8 Value (15 significant digits)=-22474.0836345981
REAL4 Value (6 significant digits)=-22474.1
HALF Value (3 significant digits)=-2.25e+004


REAL8 Value (15 significant digits)=43495.2071411472
REAL4 Value (6 significant digits)=43495.2
HALF Value (3 significant digits)=4.35e+004


REAL8 Value (15 significant digits)=42173.4249726214
REAL4 Value (6 significant digits)=42173.4
HALF Value (3 significant digits)=4.22e+004


REAL8 Value (15 significant digits)=10021.6594674816
REAL4 Value (6 significant digits)=10021.7
HALF Value (3 significant digits)=1e+004


REAL8 Value (15 significant digits)=14000.1614663608
REAL4 Value (6 significant digits)=14000.2
HALF Value (3 significant digits)=1.4e+004


REAL8 Value (15 significant digits)=7151.36409892466
REAL4 Value (6 significant digits)=7151.36
HALF Value (3 significant digits)=7.15e+003


REAL8 Value (15 significant digits)=16185.3460413327
REAL4 Value (6 significant digits)=16185.3
HALF Value (3 significant digits)=1.62e+004


REAL8 Value (15 significant digits)=-58318.6575650965
REAL4 Value (6 significant digits)=-58318.7
HALF Value (3 significant digits)=-5.83e+004


REAL8 Value (15 significant digits)=21377.3863558798
REAL4 Value (6 significant digits)=21377.4
HALF Value (3 significant digits)=2.14e+004


REAL8 Value (15 significant digits)=62768.6210416745
REAL4 Value (6 significant digits)=62768.6
HALF Value (3 significant digits)=6.28e+004


REAL8 Value (15 significant digits)=-7787.06495612929
REAL4 Value (6 significant digits)=-7787.06
HALF Value (3 significant digits)=-7.79e+003


REAL8 Value (15 significant digits)=42013.0045911184
REAL4 Value (6 significant digits)=42013
HALF Value (3 significant digits)=4.2e+004


REAL8 Value (15 significant digits)=-61322.6968192349
REAL4 Value (6 significant digits)=-61322.7
HALF Value (3 significant digits)=-6.13e+004


REAL8 Value (15 significant digits)=27707.4044548888
REAL4 Value (6 significant digits)=27707.4
HALF Value (3 significant digits)=2.77e+004


REAL8 Value (15 significant digits)=-4169.53884418745
REAL4 Value (6 significant digits)=-4169.54
HALF Value (3 significant digits)=-4.17e+003


REAL8 Value (15 significant digits)=21556.1687919651
REAL4 Value (6 significant digits)=21556.2
HALF Value (3 significant digits)=2.16e+004


REAL8 Value (15 significant digits)=54650.2552792932
REAL4 Value (6 significant digits)=54650.3
HALF Value (3 significant digits)=5.47e+004


REAL8 Value (15 significant digits)=29722.4088189263
REAL4 Value (6 significant digits)=29722.4
HALF Value (3 significant digits)=2.97e+004

HSE · September 16, 2018, 01:12:03 AM

ObjAsm32 RNG :t

aw27 · September 16, 2018, 02:16:40 AM

Quote from: HSE on September 16, 2018, 01:12:03 AM
ObjAsm32 RNG :t

I stick to this old definition of Assembly Language

HSE · September 16, 2018, 02:30:47 AM

Exactly!!

Later:
Just in case try 32 bit version.
I think 64 bit version only erase some macros implemented like HLL in JWASM derivatives, but I'm not sure.

I write well in previous post: 32 bit

Siekmanski · September 16, 2018, 08:22:14 AM

I use these 2 fast and small hacks to calculate random real4 values.
A: between 0.0 and 1.0
B: between -1.0 and 1.0

Code Select

.const
Fl32_1      real4 1.0
Fl32_3      real4 3.0

.data
align 4
Seed        dd 476954562   ; initialize once with a seed value, can be anything except 0
MagicRnd    dd 16807
Scale       real4 255.0

.code
;A
    mov     eax,Seed
    mul     MagicRnd
    mov     Seed,eax
    shr     eax,9
    or      eax,03f800000h
    mov     dword ptr [esp-4],eax
    movss   xmm0,dword ptr [esp-4]
    subss   xmm0,Fl32_1 ; result = a random real4 between 0.0 and 1.0
;   mulss   xmm0,Scale  

;B
    mov     eax,Seed
    mul     MagicRnd
    mov     Seed,eax
    shr     eax,9
    or      eax,040000000h
    mov     dword ptr [esp-4],eax
    movss   xmm0,dword ptr [esp-4]
    subss   xmm0,Fl32_3 ; result = a random real4 between -1.0 and 1.0
;   mulss   xmm0,Scale

Siekmanski · September 16, 2018, 05:30:48 PM

New fully commented code, maximum 4 random real4 values at once.
Still have to check if the distribution is satisfactory....

Code Select


.const

Shuffle MACRO V0,V1,V2,V3
    EXITM %((V0 shl 6) or (V1 shl 4) or (V2 shl 2) or (V3))
ENDM

align 16
Range12     dd    03f800000h,03f800000h,03f800000h,03f800000h
Range24     dd    040000000h,040000000h,040000000h,040000000h
MagicRnd    dd    16807,16807,16807,16807
Fl32_1      real4 1.0,1.0,1.0,1.0
Fl32_3      real4 3.0,3.0,3.0,3.0

.data
align 16
Seed        dd    476954562,473954562,471954562,479954562   ; initialize once with seed values
Scale       real4 255.0,255.0,255.0,255.0

.code

A:
    movdqa  xmm0,oword ptr Seed         ; Get the 4 seeds
    movdqa  xmm2,oword ptr MagicRnd     ; Get the 4 MagicRnds
    pshufd  xmm1,xmm0,Shuffle(0,1,2,3)  ; shuffle the second multiplication in place
    pmuludq xmm0,xmm2                   ; Save the first pair qword multiply results
    pmuludq xmm1,xmm2                   ; Save the second pair qword multiply results
    shufps  xmm0,xmm1,Shuffle(2,0,2,0)  ; Save the low 32 bit parts from the 4 qwords
    movdqa  oword ptr Seed,xmm0         ; Save the new 4 Seeds for the next run
    psrld   xmm0,9                      ; Shift them to the fractional parts
    orps    xmm0,oword ptr Range12      ; Generate 4 real4 random numbers between 1.0 and 2.0
    subps   xmm0,oword ptr Fl32_1       ; Set the ranges of the 4 values between 0.0 and 1.0
;    mulps   xmm0,oword ptr Scale       ; if you want to scale the values up to a range of your choice 
    ret

B:
    movdqa  xmm0,oword ptr Seed         ; Get the 4 seeds
    movdqa  xmm2,oword ptr MagicRnd     ; Get the 4 MagicRnds
    pshufd  xmm1,xmm0,Shuffle(0,1,2,3)  ; shuffle the second multiplication in place
    pmuludq xmm0,xmm2                   ; Save the first pair qword multiply results
    pmuludq xmm1,xmm2                   ; Save the second pair qword multiply results
    shufps  xmm0,xmm1,Shuffle(2,0,2,0)  ; Save the low 32 bit parts from the 4 qwords
    movdqa  oword ptr Seed,xmm0         ; Save the new 4 Seeds for the next run
    psrld   xmm0,9                      ; Shift them to the fractional parts
    orps    xmm0,oword ptr Range24      ; Generate 4 real4 random numbers between 2.0 and 4.0
    subps   xmm0,oword ptr Fl32_3       ; Set the ranges of the 4 values between -1.0 and 1.0
;    mulps   xmm0,oword ptr Scale       ; if you want to scale the values up to a range of your choice 
    ret

EDIT: 1 pmuludq is not enough to get 4 random values.
We need 2 pmuludq and save the 4 low 32 bits of the 4 qwords.

REPLACED THE OLD CODE WITH THIS NEW CODE !!!!! :icon_redface:

aw27 · September 16, 2018, 10:14:58 PM

Good job :t

Quote from: Siekmanski on September 16, 2018, 05:30:48 PM
Still have to check if the distribution is satisfactory....

May be not, it is a Lehmer random number generator :(. At least not as good as rdrand, which is certified by NSA

Siekmanski · September 16, 2018, 11:11:29 PM

It was not reliable enough because of the behaviour of "pmuludq".
I had to split it up in 2 parts to get the full 32 bit range for the 4 seeds. ( didn't noticed it before.... )
See Reply #5 for the new code.

This will be enough for audio and graphics programming and its fast.

jj2007 · September 16, 2018, 11:34:55 PM

Quote from: Siekmanski on September 16, 2018, 11:11:29 PM
It was not reliable enough because of the behaviour of "pmuludq"

Yes, it's pretty confusing ;-)

Multiplies the first operand (destination operand) by the second operand (source operand) and stores the result in the destination operand. The source operand can be a unsigned doubleword integer stored in the low doubleword of an MMX� technology register or a 64-bit memory location, or it can be two packed unsigned doubleword integers stored in the first (low) and third doublewords of an XMM register or an 128-bit memory location. The destination operand can be a unsigned doubleword integer stored in the low doubleword an MMX register or two packed doubleword integers stored in the first and third doublewords of an XMM register. When packed doubleword operands are used, a SIMD multiply is performed on two sets of values, producing two results. When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).

aw27 · September 17, 2018, 02:13:41 AM

Quote from: jj2007 on September 16, 2018, 11:34:55 PM
Yes, it's pretty confusing ;-)

You need a drawing. Try to search with google and click the images tab. :idea:

jj2007 · September 17, 2018, 02:21:59 AM

Quote from: AW on September 17, 2018, 02:13:41 AMYou need a drawing

My red highlighting above is enough. It's pretty simple once you've understood it.

Caché GB · September 25, 2018, 12:37:04 PM

Hi Siekmanski.

Thank you very much for your random for 4 x real4 values.
This is awesome for game programming, just like your 15 at once timers from here:

http://masm32.com/board/index.php?topic=7060.48 (MultimediaTimers.zip)

Using jj2007's formula from here

http://masm32.com/board/index.php?topic=7419. (Reply #2)

to fill the seed

Code Select


AutoSetRandom proc

   local  SaveEdi:dword

          mov  SaveEdi, edi    ; Object pointer

          mov  edi, offset Seed
          xor  esi, esi
    .Repeat

       invoke  Sleep, 1        ; leave the time slice

        cpuid                  ; serialise
        rdtsc
          mov  byte ptr[edi+esi], al
          inc  esi

     .Until( esi > 15 )

          mov  edi, SaveEdi
          ret

AutoSetRandom  endp

Thank you jj2007 also for this code.

johnsa · September 28, 2018, 06:31:10 PM

For Crypto RNG, rdrand.. yep.. but it's slow as heck.

For PRNG, the best option at the moment is XoroShiro128+
It's distribution and statistical characteristics are excellent, performance is fantastic, sub nanosecond per result.
I've been using this approach with much success over standard PRNG algos both in terms of quality and performance.

Another alternative, depending on your application is to look at using low discrepancy sequences like Halton which in some cases can provide much better results than a PRNG (such as convergence rates etc).

Siekmanski · September 28, 2018, 08:05:20 PM

Another fast one is the PCG Family: http://www.pcg-random.org/

jj2007 · September 28, 2018, 08:57:25 PM

Quote from: johnsa on September 28, 2018, 06:31:10 PMFor PRNG, the best option at the moment is XoroShiro128+

As Marinus mentioned already, there is PCG32. They claim it's better than XoroShiro128, as proven with PractRand. It's a long and controversial issue, though.

The MASM Forum

News:

Floating point PRNG

aw27

HSE

aw27

HSE

Siekmanski

Siekmanski

aw27

Siekmanski

jj2007

aw27

jj2007

Caché GB

johnsa

Siekmanski

jj2007