I did a quick knife and fork on the nrandom algo the Jaymeson Trudgen wrote years ago and generally it has been a good performer over a long period but its range is still unsigned DWORD.
I wonder if anyone has a decent 64 bit random number generator written in 64 bit assembler that I could add to the library.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
nrandom PROC base:QWORD
; Park Miller random number generator
; -----------------------------------------
; original code written by Jaymeson Trudgen
; minor modifiation on the recommendation
; of Park and Miller 1993
; range is unsigned DWORD
; -----------------------------------------
mov rax, nrandom_seed
; ****************************************
test eax, 80000000h
jz @F
add eax, 7fffffffh
@@:
; ****************************************
xor edx, edx
mov ecx, 127773
div ecx
mov ecx, eax
mov eax, 48271 ; suggested mofification by Park and Miller : 16807 = old value
mul edx
mov edx, ecx
mov ecx, eax
mov eax, 2836
mul edx
sub ecx, eax
xor edx, edx
mov eax, ecx
mov nrandom_seed, rcx
div base
mov eax, edx
ret
nrandom ENDP
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
nseed proc TheSeed:QWORD
.data
nrandom_seed dq 12345678
.code
mov rax, TheSeed
mov nrandom_seed, rax
ret
nseed endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
I did a quick test with Rand64(), an adaption of Alex Bagayev's (aka Antariy) algo (http://www.masmforum.com/board/index.php?topic=11679.0), and it seems to work in 64-bit land:1 8334395503510177650
2 1329069443230085730
3 2521605505399485735
4 -2513648910981773731
5 -6348034448820748366
6 -3307815091408750516
7 4066175164584430395
8 -3922227355712582401
9 428641309453789440
10 -8910786329316853516
11 -8070429261690835801
12 6730337785625512305
13 -7805041995582204841
14 -5974524378947664316
15 4637574160357525215
This code was assembled with ml64 in 64-bit format
It is ultrafast, and randomness is excellent, as tested with the Diehard and ENT suites.
https://en.wikipedia.org/wiki/Xorshift
PellesC output obj
extern s: qword
xorshift128plus PROC
mov rax, qword ptr [s] ; 0000 _ 48: 8B. 05, 00000000(rel)
mov rdx, qword ptr [s+8H] ; 0007 _ 48: 8B. 15, 00000008(rel)
mov qword ptr [s], rdx ; 000E _ 48: 89. 15, 00000000(rel)
mov rcx, rax ; 0015 _ 48: 89. C1
shl rcx, 23 ; 0018 _ 48: C1. E1, 17
xor rax, rcx ; 001C _ 48: 31. C8
mov rcx, rax ; 001F _ 48: 89. C1
xor rcx, rdx ; 0022 _ 48: 31. D1
shr rax, 17 ; 0025 _ 48: C1. E8, 11
xor rcx, rax ; 0029 _ 48: 31. C1
mov rax, rdx ; 002C _ 48: 89. D0
shr rax, 26 ; 002F _ 48: C1. E8, 1A
xor rcx, rax ; 0033 _ 48: 31. C1
mov qword ptr [s+8H], rcx ; 0036 _ 48: 89. 0D, 00000008(rel)
mov rax, qword ptr [s+8H] ; 003D _ 48: 8B. 05, 00000008(rel)
add rax, rdx ; 0044 _ 48: 01. D0
ret ; 0047 _ C3
xorshift128plus ENDP
Good reference, TWell, thank you. Bit shift pseudorandom number generators must be ultra light fast. 8)
From Wikipedia:
The Mersenne Twister is a pseudorandom number generator (PRNG). It is by far the most widely used general-purpose PRNG.[1] Its name derives from the fact that its period length is chosen to be a Mersenne prime.
The Mersenne Twister was developed in 1997 by Makoto Matsumoto (ja) (松本 眞) and Takuji Nishimura (西村 拓士).[2] It was designed specifically to rectify most of the flaws found in older PRNGs. It was the first PRNG to provide fast generation of high-quality pseudorandom integers.
The most commonly used version of the Mersenne Twister algorithm is based on the Mersenne prime 219937−1. The standard implementation of that, MT19937, uses a 32-bit word length. There is another implementation that uses a 64-bit word length, MT19937-64; it generates a different sequence.
Below is based on: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/C-LANG/mt19937-64.c
Tested and obtained the same results as the original in C:
; uasm64 -c -win64 -Zp8 MT64asm.asm
; link /ENTRY:main /SUBSYSTEM:console mt64asm.obj
option casemap:none
option frame:auto
OPTION WIN64:2
NN equ 312
MM equ 156
MATRIX_A equ 0B5026F5AA96619E9h
UM equ 0FFFFFFFF80000000h
LM equ 07FFFFFFFh
MULT equ 6364136223846793005
includelib \masm32\lib64\msvcrt.lib
printf proto :ptr, :vararg
.data
mt dq NN dup (0)
mti qword NN+1
mag01 dq 0, MATRIX_A
;KEYS dq 123456789ABCDEFh, 23456789ABCDEF1h, 3456789ABCDEF12h, 456789ABCDEF123h
KEYS dq 12345h, 23456h, 34567h, 45678h ; For testing, the same as in the C code
format0 db "1000 outputs of genrand64_int64()",13,10,0
format1 db 13,10,"1000 outputs of genrand64_real2()",13,10,0
format2 db "%20llu ",0
format3 db "%10.8f ",0
formatcr db 13,10,0
.const
c1 real8 1.1102230246251566636831481088739e-16
c2 real8 1.1102230246251565404236316680908e-16
c3 real8 0.5
c4 real8 2.2204460492503130808472633361816e-16
.code
init_genrand64 proc uses rsi seed : qword
mov rsi, offset mt
mov seed, rcx
mov qword ptr [rsi], rcx
mov r8, 1
.while r8 < NN
mov r9, r8
dec r9
mov r10, qword ptr [rsi+r9*sizeof qword]
mov r11, r10 ; save it
shr r10, 62
xor r11, r10
mov rax, MULT
mul r11
add rax, r8
mov qword ptr [rsi+r8*sizeof qword ], rax
inc r8
.endw
mov mti, r8
ret
init_genrand64 endp
init_by_array64 proc uses rsi init_key : ptr, key_len : qword
mov init_key, rcx
mov key_len, rdx
INVOKE init_genrand64, 19650218
mov rcx, init_key
mov r8, 1
mov r9, 0
mov r10, 0
.if key_len<NN
mov r10, NN
.else
mov r10, key_len
.endif
mov rsi, offset mt
.while r10>0
mov rax, qword ptr [rsi+r8*sizeof qword - sizeof qword]
mov r11, rax ; save
shr r11, 62
xor rax, r11
mov r11, 3935559000370003845
mul r11
mov r11, qword ptr [rsi+r8*sizeof qword]
xor r11, rax
add r11, qword ptr[rcx + r9*sizeof qword]
add r11, r9
mov qword ptr [rsi+r8*sizeof qword], r11
inc r8
inc r9
.if r8>=NN
mov rax, qword ptr [rsi+(NN-1)*sizeof qword]
mov qword ptr [rsi], rax
mov r8, 1
.endif
.if r9>=key_len
mov r9,0
.endif
dec r10
.endw
mov r10, NN-1
.while r10>0
mov rax, qword ptr [rsi+r8*sizeof qword - sizeof qword]
mov r11, rax
shr r11, 62
xor rax, r11
mov r11, 2862933555777941757
mul r11
mov r11, qword ptr [rsi+r8*sizeof qword]
xor r11, rax
sub r11, r8
mov qword ptr [rsi+r8*sizeof qword], r11
inc r8
.if r8>=NN
mov rax, qword ptr [rsi+(NN-1)*sizeof qword]
mov qword ptr [rsi], rax
mov r8, 1
.endif
dec r10
.endw
mov rax, 1 shl 63
mov qword ptr [rsi], rax
ret
init_by_array64 endp
genrand64_int64 proc uses rsi rdi
mov rsi, offset mt
.if mti>=NN
.if mti==NN+1
invoke init_genrand64, 5489
.endif
mov r8, 0
mov rdi, offset mag01
.while r8<(NN-MM)
mov rax, qword ptr [rsi+r8*sizeof qword]
and rax, UM
mov rcx, qword ptr [rsi+r8*sizeof qword+sizeof qword]
and rcx, LM
or rax, rcx ; x
mov rcx, rax
and rcx, 1
mov rdx, qword ptr [rdi+rcx*sizeof qword] ; mag01[(int)(x & 1ULL)];
mov rcx, rax
shr rcx, 1
xor rcx, rdx ; (x >> 1) ^ mag01[(int)(x & 1ULL)];
mov rax, qword ptr [rsi+r8*sizeof qword+MM*sizeof qword ]
xor rax, rcx
mov qword ptr [rsi+r8*sizeof qword], rax
inc r8
.endw
.while r8<(NN-1)
mov rax, qword ptr [rsi+r8*sizeof qword]
and rax, UM
mov rcx, qword ptr [rsi+r8*sizeof qword+sizeof qword]
and rcx, LM
or rax, rcx ; x
mov rcx, rax
and rcx, 1
mov rdx, qword ptr [rdi+rcx*sizeof qword] ; mag01[(int)(x & 1ULL)];
mov rcx, rax
shr rcx, 1
xor rcx, rdx ; (x >> 1) ^ mag01[(int)(x & 1ULL)];
mov rax, qword ptr [rsi + r8*sizeof qword+(MM - NN)*sizeof qword]
xor rax, rcx
mov qword ptr [rsi+r8*sizeof qword], rax
inc r8
.endw
mov rax, qword ptr [rsi+(NN-1)*sizeof qword]
and rax, UM
mov rcx, qword ptr [rsi]
and rcx, LM
or rax, rcx ; x
mov rcx, rax
and rcx, 1
mov rdx, qword ptr [rdi+rcx*sizeof qword] ; mag01[(int)(x & 1ULL)];
mov rcx, rax
shr rcx, 1
xor rcx, rdx ; (x >> 1) ^ mag01[(int)(x & 1ULL)];
mov rax, qword ptr [rsi+(MM-1)*sizeof qword]
xor rax, rcx
mov qword ptr [rsi+(NN-1)*sizeof qword], rax
mov mti,0
.endif
mov r8, mti
mov rax, qword ptr [rsi+r8*sizeof qword]
inc r8
mov mti, r8
mov rcx, rax ; save
shr rax, 29
mov rdx, 5555555555555555h
and rax, rdx
xor rcx, rax
mov rax, rcx
shl rax, 17
mov rdx, 71D67FFFEDA60000h
and rax, rdx
xor rcx, rax
mov rax, rcx
shl rax, 37
mov rdx, 0FFF7EEE000000000h
and rax, rdx
xor rcx, rax
mov rax, rcx
shr rcx, 43
xor rax, rcx
ret
genrand64_int64 endp
genrand64_int63 proc
INVOKE genrand64_int64
shr rax, 1
ret
genrand64_int63 endp
genrand64_real1 proc
INVOKE genrand64_int64
shr rax, 11
cvtsi2sd xmm0,rax
mulsd xmm0, c1
ret
genrand64_real1 endp
genrand64_real2 proc
INVOKE genrand64_int64
shr rax, 11
cvtsi2sd xmm0,rax
mulsd xmm0, c2
ret
genrand64_real2 endp
genrand64_real3 proc
INVOKE genrand64_int64
shr rax, 12
cvtsi2sd xmm0,rax
addsd xmm0, c3
mulsd xmm0, c4
ret
genrand64_real3 endp
main proc
invoke init_by_array64, offset KEYS, LENGTHOF KEYS
mov rbx, 0
mov r12,0
invoke printf, offset format0
.while rbx<1000
invoke genrand64_int64
invoke printf, offset format2, rax
inc r12
.if r12==5
mov r12,0
invoke printf, offset formatcr
.endif
inc rbx
.endw
mov rbx, 0
mov r12,0
invoke printf, offset format1
.while rbx<1000
invoke genrand64_real2
movq rdx, xmm0
invoke printf, offset format3, rdx
inc r12
.if r12==5
mov r12,0
invoke printf, offset formatcr
.endif
inc rbx
.endw
ret
main endp
end
I have so far tested the version that Tim posted, runs OK, seems fast enough but the results under ENT are very ordinary. I have played with twiddling the output and improved the result but its not what you would call good. Test piece attached, run the executable "xorshift.exe", the batch file is to run ENT which is called from the test piece.
It's difficult to find a high quality PRNG. Here is the (mediocre) ENT output for xorshift:Entropy = 7.762594 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 2 percent.
Chi square distribution for 1000000 samples is 536885.84, and randomly
would exceed this value 0.01 percent of the times.
Arithmetic mean value of data bytes is 128.0349 (127.5 = random).
Monte Carlo value for Pi is 3.164364657 (error 0.72 percent).
Serial correlation coefficient is -0.010624 (totally uncorrelated = 0.0).
For comparison:617 ms incl. writing the file, M$ RtlRandomEx()
588 ms without writing
2395 ms incl. writing the file, MasmBasic Rand()
7162 µs without writing
############ ENT results RtlRandomEx32, with swap:
Entropy = 7.999985 bits per byte.
Optimum compression would reduce the size
of this 11468800 byte file by 0 percent.
Chi square distribution for 11468800 samples is 244.22, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.4928 (127.5 = random).
Monte Carlo value for Pi is 3.142348334 (error 0.02 percent).
Serial correlation coefficient is -0.000229 (totally uncorrelated = 0.0).
############ ENT results Rand64:
Entropy = 7.999984 bits per byte.
Optimum compression would reduce the size
of this 11468800 byte file by 0 percent.
Chi square distribution for 11468800 samples is 252.68, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.5000 (127.5 = random).
Monte Carlo value for Pi is 3.142249980 (error 0.02 percent).
Serial correlation coefficient is 0.001482 (totally uncorrelated = 0.0).
There is great collection here at CAcert Research Lab (http://www.cacert.at/cgi-bin/rngresults).
JJ,
I downloaded the "DualRandMb.zip" but there is no source for the DualRand64 executable, I have looked at the DualRand.inc file but it contains a very messy looking macro with no context.
This is the source (DualRand.asm):include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with ML, AsmC, JWasm, HJWasm ##
include DualRand.inc ; excerpt from MasmBasic.inc
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
xor ebx, ebx
@@:
inc ebx
Print Str$("%i\t", rbx) ; counter
void Rand64() ; returns eax and edx
shl rax, 32
or rax, rdx
Print Str$("%lli\n", rax) ; random number
cmp ebx, 20
jb @B
Inkey Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
EndOfCode
The file DualRand.inc contains the Rand() macro, which is meant for the 32-version of MasmBasic; therefore it is not included in JBasic.inc, so I had to extract it to make it available in the dual 64/32-bit version.
Since I have a suspicion that you will not install this exclusive software package (http://masm32.com/board/index.php?topic=6412.msg68927#msg68927), I attach a version with an int 3 before the void Rand64(), and a nops 8 after that. You know how to use it, and you will be surprised ;-)
While the Xorshift algo is clearly an inferior specimen, it is indeed fast.
Being fast does not mean that it can't be even faster. This raises again the point of artificial intelligence versus human intelligence.
Pelles compiler produced the logical output, most ASM programmers would do something very similar, myself included.
The Visual Studio 2017 used AI techniques saving 2 instructions and clearly gaining in speed. Although with only 15 instructions (versus 17 of Pelles) it took me a while to understand why it still works.
There are dozens of int3 lines in the disassembly and NO, I will not be messing up a development system with MB. Looks like you get some interesting things done but with no access to any assembler source and no way to independently test it, any comparisons are meaningless. In an assembler forum I post assembler code, can routinely translate between almost any assembler but I draw the line at a closed encrypted system written in RTF and only supplied as a binary where the content is unfindable as it is simply a waste of time.
This is a test piece in 32 MASM using NaN's old "nrandom" algo post processed. With small counts which are useful if you want a number range from 0 to 10, NaN's original algo runs OK but the level of randomness is poor but is useful for small counts of predetermined range. Give it a post process massage on a large sample > 1 million and it produces results like this.
Entropy = 7.999947 bits per byte.
Optimum compression would reduce the size
of this 4000000 byte file by 0 percent.
Chi square distribution for 4000000 samples is 291.42, and randomly
would exceed this value 10.00 percent of the times.
Arithmetic mean value of data bytes is 127.4818 (127.5 = random).
Monte Carlo value for Pi is 3.137529138 (error 0.13 percent).
Serial correlation coefficient is 0.000348 (totally uncorrelated = 0.0).
Press any key to continue . . .
Now with the version that Tim posted, it does not have a range specifier so its use will be limited to making very large pads but it is fast enough to do some post processing on it, when I get a bit further in front I would like to try out the mersenne twister that AW posted as it is a well known random algo.
Here is the 32 bit test piece.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
.data?
nrandom_seed dd ? ; global random seed variable
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
icnt equ <1000000> ; iteration count
rang equ <10000000> ; range = 0 to rang
main proc
LOCAL cntr :DWORD
LOCAL buff[64]:BYTE
LOCAL pbuf :DWORD
LOCAL pMem :DWORD
LOCAL rval :DWORD
LOCAL ivar :DWORD
LOCAL buf2[64]:BYTE
push ebx
push esi
push edi
mov pbuf, ptr$(buff)
fn GetTickCount
rol eax, 29
mov nrandom_seed, eax
mov cntr, icnt ; iteration count
mov eax, cntr
lea eax, [eax*4]
mov pMem, alloc(eax) ; allocate iteration count * 4
mov esi, pMem
@@:
invoke nrandom,rang
mov [esi], eax
add esi, 4
sub cntr, 1
cmp cntr, 0
ja @B
; --------------------------------------
mov ivar, 0
REPEAT 7
add ivar, 4
invoke SleepEx,100,0
fn GetTickCount
mov ecx, ivar
ror eax, cl
mov nrandom_seed, eax ; multiple re-seeding
mov cntr, icnt ; iteration count
mov esi, pMem
@@:
invoke nrandom,rang
bswap eax
rol eax,3
xor [esi], eax
add esi, 4
sub cntr, 1
jnz @B
; cmp cntr, 0
; ja @B
ENDM
; --------------------------------------
mov rval, OutputFileA("random.dat",pMem,icnt*4)
fn WinExec,"test.bat",1
free pMem
pop edi
pop esi
pop ebx
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
Quote from: hutch-- on July 31, 2017, 08:04:16 PMThere are dozens of int3 lines in the disassembly and NO, I will not be messing up a development system with MB.
Sigh...
0000000140001002 | C8 50 00 00 | enter 50, 0 |
0000000140001006 | E8 84 01 00 00 | call 14000118F |
000000014000100B | 33 DB | xor ebx, ebx |
000000014000100D | FF C3 | inc ebx |
000000014000100F | 48 8D 15 FA 02 00 00 | lea rdx, qword ptr ds:[140001310] | 140001310:"%i\t"
0000000140001016 | 4C 8B C3 | mov r8, rbx |
0000000140001019 | 48 8D 0D 98 04 00 00 | lea rcx, qword ptr ds:[1400014B8] |
0000000140001020 | FF 15 D2 05 00 00 | call qword ptr ds:[1400015F8] |
0000000140001026 | 48 89 05 83 04 00 00 | mov qword ptr ds:[1400014B0], rax |
000000014000102D | 48 89 75 F8 | mov qword ptr ss:[rbp-8], rsi |
0000000140001031 | 48 89 5D F0 | mov qword ptr ss:[rbp-10], rbx |
0000000140001035 | 48 BE B8 14 00 40 01 00 00 00 | movabs rsi, 1400014B8 |
000000014000103F | 48 C7 C1 F5 FF FF FF | mov rcx, FFFFFFFFFFFFFFF5 |
0000000140001046 | FF 15 B4 05 00 00 | call qword ptr ds:[140001600] | scroll down, there is much more...
000000014000104C | 48 93 | xchg rax, rbx |
000000014000104E | 48 8B CE | mov rcx, rsi |
0000000140001051 | FF 15 B1 05 00 00 | call qword ptr ds:[140001608] |
0000000140001057 | 45 33 D2 | xor r10d, r10d |
000000014000105A | 4C 89 54 24 20 | mov qword ptr ss:[rsp+20], r10 |
000000014000105F | 4C 8D 0D 32 04 00 00 | lea r9, qword ptr ds:[140001498] |
0000000140001066 | 4C 8B C0 | mov r8, rax |
0000000140001069 | 48 8B D6 | mov rdx, rsi |
000000014000106C | 48 8B CB | mov rcx, rbx |
000000014000106F | FF 15 9B 05 00 00 | call qword ptr ds:[140001610] |
0000000140001075 | 48 8D 55 F8 | lea rdx, qword ptr ss:[rbp-8] | rdx:EntryPoint
0000000140001079 | 48 3B D4 | cmp rdx, rsp | rdx:EntryPoint
000000014000107C | 75 00 | jne 14000107E | jmp $0
000000014000107E | 48 8B 75 F8 | mov rsi, qword ptr ss:[rbp-8] |
0000000140001082 | 48 8B 5D F0 | mov rbx, qword ptr ss:[rbp-10] |
0000000140001086 | CC | int3 |<<<<<<<<< THERE IT IS!! <<<<<<<<<<<
Yes, there are dozens of
int 3, right after the
ret. That's what assemblers usually do for padding, but OK, if you don't trust me after a bit more than ten years on this forum, then you better use the xorshift algo.
:biggrin:
> That's what assemblers usually do for padding, but OK, if you don't trust me after a bit more than ten years on this forum, then you better use the xorshift algo.
I will, I will !!!! :P
"xorshift2".
For a 1 instruction addition, it went from a dud to a kick ass random generator.
Entropy = 7.999979 bits per byte.
Optimum compression would reduce the size
of this 8000000 byte file by 0 percent.
Chi square distribution for 8000000 samples is 237.28, and randomly
would exceed this value 75.00 percent of the times.
Arithmetic mean value of data bytes is 127.5431 (127.5 = random).
Monte Carlo value for Pi is 3.139971785 (error 0.05 percent).
Serial correlation coefficient is -0.000246 (totally uncorrelated = 0.0).
Press any key to continue . . .
This is the only mod necessary.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
irand PROC
mov rax, random_seed
mov rdx, random_seed+8h
mov random_seed, rdx
mov rcx, rax
shl rcx, 23
xor rax, rcx
mov rcx, rax
xor rcx, rdx
shr rax, 17
xor rcx, rax
mov rax, rdx
shr rax, 26
xor rcx, rax
mov random_seed+8h, rcx
mov rax, random_seed+8h
add rax, rdx
bswap rax ; <<<< minor mod here. Set the most variable end first
ret
irand ENDP
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
It can be improved but not by much. Source and test piece attached. Run the executable, the batch file is to run ENT.
So with PellesC:#include <stdint.h>
#include <intrin.h>
/* The state must be seeded so that it is not zero */
uint64_t s[2];
uint64_t xorshift128plus(void) {
uint64_t x = s[0];
uint64_t const y = s[1];
s[0] = y;
x ^= x << 23; // a
s[1] = x ^ y ^ (x >> 17) ^ (y >> 26); // b, c
return _bswap64(s[1] + y);
}
JJ,
This is unintelligible.
enter 50, 0
call 14000118F ; calling address beyond end of sample
xor ebx, ebx
inc ebx
lea rdx, QWORD PTR [140001310] ; 140001310:"%i\t"
mov r8, rbx
lea rcx, QWORD PTR [1400014B8]
call QWORD PTR [1400015F8] ; calling address beyond end of sample
mov QWORD PTR [1400014B0], rax
mov QWORD PTR [rbp-8], rsi
mov QWORD PTR [rbp-10], rbx
movabs rsi, 1400014B8
mov rcx, FFFFFFFFFFFFFFF5
call QWORD PTR [140001600] ; scroll down, there is much more...
xchg rax, rbx
mov rcx, rsi
call QWORD PTR [140001608]
xor r10d, r10d
mov QWORD PTR [rsp+20], r10
lea r9, QWORD PTR [140001498]
mov r8, rax
mov rdx, rsi
mov rcx, rbx
call QWORD PTR [140001610] ; calling address beyond end of sample
lea rdx, QWORD PTR [rbp-8] ; rdx:EntryPoint
cmp rdx, rsp ; rdx:EntryPoint
jne lbl0 ; jmp $0
lbl0:
mov rsi, QWORD PTR [rbp-8]
mov rbx, QWORD PTR [rbp-10]
; Urrrrgh, how does it exit. Normally with the "enter" leading mnemonic it has a corresponding leave | ret
; 0000000140001002 | C8 50 00 00 | enter 50, 0 |
; 0000000140001006 | E8 84 01 00 00 | call 14000118F |
; 000000014000100B | 33 DB | xor ebx, ebx |
; 000000014000100D | FF C3 | inc ebx |
; 000000014000100F | 48 8D 15 FA 02 00 00 | lea rdx, qword ptr ds:[140001310] | 140001310:"%i\t"
; 0000000140001016 | 4C 8B C3 | mov r8, rbx |
; 0000000140001019 | 48 8D 0D 98 04 00 00 | lea rcx, qword ptr ds:[1400014B8] |
; 0000000140001020 | FF 15 D2 05 00 00 | call qword ptr ds:[1400015F8] |
; 0000000140001026 | 48 89 05 83 04 00 00 | mov qword ptr ds:[1400014B0], rax |
; 000000014000102D | 48 89 75 F8 | mov qword ptr ss:[rbp-8], rsi |
; 0000000140001031 | 48 89 5D F0 | mov qword ptr ss:[rbp-10], rbx |
; 0000000140001035 | 48 BE B8 14 00 40 01 00 00 00 | movabs rsi, 1400014B8 |
; 000000014000103F | 48 C7 C1 F5 FF FF FF | mov rcx, FFFFFFFFFFFFFFF5 |
; 0000000140001046 | FF 15 B4 05 00 00 | call qword ptr ds:[140001600] | scroll down, there is much more...
; 000000014000104C | 48 93 | xchg rax, rbx |
; 000000014000104E | 48 8B CE | mov rcx, rsi |
; 0000000140001051 | FF 15 B1 05 00 00 | call qword ptr ds:[140001608] |
; 0000000140001057 | 45 33 D2 | xor r10d, r10d |
; 000000014000105A | 4C 89 54 24 20 | mov qword ptr ss:[rsp+20], r10 |
; 000000014000105F | 4C 8D 0D 32 04 00 00 | lea r9, qword ptr ds:[140001498] |
; 0000000140001066 | 4C 8B C0 | mov r8, rax |
; 0000000140001069 | 48 8B D6 | mov rdx, rsi |
; 000000014000106C | 48 8B CB | mov rcx, rbx |
; 000000014000106F | FF 15 9B 05 00 00 | call qword ptr ds:[140001610] |
; 0000000140001075 | 48 8D 55 F8 | lea rdx, qword ptr ss:[rbp-8] | rdx:EntryPoint
; 0000000140001079 | 48 3B D4 | cmp rdx, rsp | rdx:EntryPoint
; 000000014000107C | 75 00 | jne 14000107E | jmp $0
; 000000014000107E | 48 8B 75 F8 | mov rsi, qword ptr ss:[rbp-8] |
; 0000000140001082 | 48 8B 5D F0 | mov rbx, qword ptr ss:[rbp-10] |
; 0000000140001086 | CC | int3 |<<<<<<<<< THERE IT IS!! <<<<<<<<<<<
Quote from: hutch-- on July 31, 2017, 08:56:49 PMFor a 1 instruction addition, it went from a dud to a kick ass random generator.
DualRand.inc, line 48:
bswap eax ; we need the high order bits
And the timings are already quite ok:
811 ticks for irand
687 ticks for Rand64
This code was assembled with ml64 in 64-bit format
You see, it wasn't that difficult :P
Quote from: hutch-- on July 31, 2017, 08:56:49 PMIt can be improved but not by much. Source and test piece attached. Run the executable, the batch file is to run ENT.
Hutch,
I get this build error using www.masm32.com/download/x64make.zip from the Latest build environment for ML64 (http://masm32.com/board/index.php?topic=5631.0):
Microsoft (R) Macro Assembler (x64) Version 10.00.30319.01
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: xorshift.asm
xorshift.asm(56) : error A2008:syntax error : .exit
Does it require a new version of the macros that you have not yet posted? Or did you post it somewhere else?
(did it build for anybody else? if yes, which settings/macro files?)
> Yes and don't know. I have already posted the 64 bit random algo, the rest may come as I upgrade the libraries and macros.
This is the most recent versions I have posted,
http://masm32.com/board/index.php?topic=6365.0
The 64 bit version is a work in progress, the random app has code that has yet to be published so if you really need it, you can use the option that you offered me, disassemble it.
Quote from: hutch-- on August 05, 2017, 11:04:07 PMyou can use the option that you offered me, disassemble it.
Your sense of humour is back, great :t
qWord recommends using Window's Cryptography API. Code translated to Masm64 :
http://masm32.com/board/index.php?topic=2735.msg28966#msg28966
RandomBytes PROC dwLength:QWORD,pBuffer:QWORD
LOCAL dummy:QWORD
LOCAL hProvider:QWORD
sub rsp,5*8+8
mov dwLength,rcx
mov pBuffer,rdx
invoke CryptAcquireContext,ADDR hProvider,0,0,\
PROV_RSA_FULL,\
CRYPT_VERIFYCONTEXT or CRYPT_SILENT
invoke CryptGenRandom,hProvider,dwLength,pBuffer
invoke CryptReleaseContext,hProvider,0
ret
RandomBytes ENDP
Thanks Erol. :t
Nice example, Erol :t
CryptGenRandom is very slow, though:This code was assembled with ml64 in 64-bit format
result -8032366409749840727
result 87096612809990884
result -3122102231650220507
result 6210554543873536060
result 7239442655864793548
result 3289759298143981941
46067 ticks for CryptGenRandom
result 9056870149978556565
result -1890820636788552871
result 2387620311255054330
result 37962361222013295
result 835162364812257345
result 4123339310970107070
0 ticks for Rand64()
Interesting: The 64-bit version is exactly twice as fast - as if it was called twice internally ::)