The problem had been with the duration of saving the file output so I set up a method that uses memory mapped files where you write directly to the open mapped memory and it just about excludes the file save time which was far longer than the random generation. The random algorithm used is the library "irand" with some help from the "rrand" range version to vary the interval length of reseeding the main algo. This version has a very large capacity and has been tested up to 48 gigabytes running on my middle aged Haswell and a bit over 1 gig a second.
The app reseeds the irand algo at random intervals within a range which provides at least some brute force protection but the random pad could still theoretically be attacked by someone who spent long enough running brute force attacks with the QWORD range and get partial decryptions. What needs to be done is a multi thread or multipass version that properly protects the pad from brute force. This can be done by either the same irand algo or using a multi thread rdrand pad xorred against the first pad.
Just as an aside, the test piece that Jose did with the 64 threads was possible because the rdrand mnemonic is very slow and does not fully utilise the thread. The library "irand" algo and similar are much faster that rdrand and they saturate the thread they are contained in. It is still a useful technique because of its different random source which can be combined with a seeded algo.
The attached file is set to a low 1 gig as most 64 bit machines can run it. This much I have found with using a memory mapped file for output is that it must not exceed available memory.
Quote from: hutch-- on June 18, 2019, 04:49:07 AM
Just as an aside, the test piece that Jose did with the 64 threads was possible because the rdrand mnemonic is very slow and does not fully utilise the thread. The library "irand" algo and similar are much faster that rdrand and they saturate the thread they are contained in. It is still a useful technique because of its different random source which can be combined with a seeded algo.
Exactly, the need for multithreading implies some degree of implicit slowness.
(https://www.dropbox.com/s/pn63s0uajlzkwjx/puppies.gif?dl=1)
Lol AW suggests the threads are a lot of underdogs?
Hutch great work 7 times faster than the filewrite one 922ms,mine is a i5,that runs 2.5ghz+
maybe should try a different test that spinsup cpus to max and after that run random pad program?
,one of advantages with 64bit is you can run many threads not limited by 32bit memory alloc,if you run memory intensive cg,with each. Thread
Quote from: daydreamer on June 18, 2019, 03:49:11 PM
Lol AW suggests the threads are a lot of underdogs?
I did not invent it:
https://9gag.com/gag/aWYXDwd/multithreading
Magnus,
I am reasonably happy with the mapped file technique as it presents a writable block of memory that you can write directly to without the writefile technique and this saves a reasonable amount of time. This changes the internal logic in that instead of allocating memory and processing it, you have a linear address range that you write to which normally requires one pass to write it. The last example used multi seeding of random lengths but to properly protect the random pad it needs more than one pass to break attempts at brute forcing.
Here is the next version, it solves the problem of being vulnerable to brute force attack by doing a second xor pass which makes reconstructing the pad almost impossible. Source is attached. Something to note, if you set the pad size in excess of installed memory, it will stop the OS stone dead so if you modify the pad size, stay under your installed memory. The extra pass incurred a longer run time but it is still reasonably fast, it moved from about 900 ms on a 1 gig pad to about 1350 ms with the extra pass. I will try a multi-threaded version to see if I can reduce the timing.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
USING rsi,r12,r13,r14,r15
LOCAL hMMF :QWORD ; memory mapped file handle
LOCAL pMMF :QWORD ; pointer to mapped memory
LOCAL hFile :QWORD ; save file handle
LOCAL block :QWORD ; loop counter
SaveRegs
HighPriority
mov r13, 1024*1024*1024*1 ; the requested pad size
mov r14, r13 ; preserve original length in r14
conout "Creating random pad",lf
mov hFile, flcreate("random.pad") ; create the open file
invoke OpenMapFile,hFile,"mapfile", \
r13,ADDR hMMF,ADDR pMMF ; open the file mapping
mov r15, rv(GetTickCount) ; start timing
; *********************************************
; write data pass
; *********************************************
shr r13, 3 ; set block size from request size
call reseed ; get a seed for the random algo
rcall seed_irand,rax
mov r12, pMMF ; mapped memory address into r12
mov rsi, 1 ; set the initial reseeding count
call reseed ; get a seed for the range random algo
rcall seed_rrand,rax ; reseed the range random algo
@@:
rcall irand ; call random algo
mov QWORD PTR [r12], rax
add r12, 8 ; set next write location
sub rsi, 1 ; decrement the interval counter
jnz nxt
; ----------------------
rcall rrand, 11,1663 ; call range random for reseeding interval
mov rsi, rax
call reseed ; get a seed for the random algo
rcall seed_irand,rax ; reseed the integer random algo
; ----------------------
nxt:
sub r13, 1 ; decrement loop counter
jnz @B
; *********************************************
; xor data pass
; *********************************************
rcall rrand, 15,60
rcall SleepEx,rax,0
call reseed ; get a seed for the random algo
rcall seed_irand,rax
mov r12, pMMF ; mapped memory address into r12
mov rsi, 3 ; set the initial reseeding count
call reseed ; get a seed for the range random algo
rcall seed_rrand,rax ; reseed the range random algo
mov r13, r14
shr r13, 3
@@:
rcall irand ; call random algo
xor QWORD PTR [r12], rax
add r12, 8 ; set next write location
sub rsi, 1 ; decrement the interval counter
jnz nxt1
; ----------------------
rcall rrand, 17,4923 ; call range random for reseeding interval
mov rsi, rax
call reseed ; get a seed for the random algo
rcall seed_irand,rax ; reseed the integer random algo
; ----------------------
nxt1:
sub r13, 1 ; decrement loop counter
jnz @B
; *********************************************
rcall GetTickCount ; end timing
sub rax, r15
conout "Timing = ",str$(rax)," ms",lf
conout "File written to disk at ",str$(r14)," bytes",lf
rcall CloseMMF,pMMF,hMMF ; close the file mapping
rcall CloseHandle,hFile ; close the file
NormalPriority
waitkey
RestoreRegs
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
OpenMapFile proc hFile:QWORD,lpName:QWORD,bcnt:QWORD,pHandle:QWORD,pMemory:QWORD
LOCAL hi32 :DWORD
LOCAL lo32 :DWORD
mov lo32, r8d ; split hi and lo DWORDS of bcnt
rol r8, 32
mov hi32, r8d
invoke CreateFileMapping, \
hFile, \ ; open file handle
NULL, \
PAGE_READWRITE, \ ; read write access to memory
hi32, \ ; high DWORD of bcnt
lo32, \ ; low DWORD of bcnt
lpName ; set file object name here
mov rdx, rax ; MMF handle in RDX
mov rcx, pHandle ; address of variable in RCX
mov [rcx], rax ; write MMF handle to variable
invoke MapViewOfFile, \
rdx, FILE_MAP_WRITE,0,0,0
mov rcx, pMemory ; address of variable in RCX
mov [rcx], rax ; write start address to variable
ret
OpenMapFile endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
reseed proc
mov rax, 100 ; out of range number
cpuid ; set all regs to 0
rdtsc ; date time counter
pause ; spinlock pause
bswap rax ; reverse byte order
mov r11, rax ; store rax in r11
mov rcx, 10 ; loop count
@@:
rdtsc ; date time counter
pause ; spinlock pause
bswap rax ; reverse byte order
rol rax, 7 ; rotate left by prime
xor r11, rax ; xor rax to r11
rol r11, 5 ; rotate left by prime
sub rcx, 1 ; decrement counter
jnz @B
mov rax, r11 ; return the value in r11
ret
reseed endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
what about you unroll it so you can use vxorps 256bit ?
:skrewy:
To do what ? Have a look at the xor pass, it does almost the same as the first pass except that it xors the result to the same offset as the first pass. Now while writing in larger blocks makes sense, to take QWORD sized random numbers and load them into a larger data size, you incur the overhead of the loading which is still a number of random generator procedure calls and then the loading overhead. :winking:
CUDA has a vast random number generation library, called CURAND. However, I don't know if it feasible to use from ASM, because it does not appear to support the driver API (only the runtime API). Something I will check sometime in the future.
Hutch maybe you should try multithreaded
I lack rols so I need todo workarounds and ideas for seeds after PI,sqrt(2) to start with 128bit+,maybe use very big prime numbers
adding rdtsc value
got stucked,upper 64bit doesnt work
Magnus,
You worry me at times. If you look at the reseed algo in the examples, it only uses the low 32 bit of rdtsc and then it does a sequence of rotates and xor to always produce a 64 bit integer. I will shortly post the final version but in between I wrote a multi-threaded version that was far slower than the single thread version because the thread overhead exceeded the runtime of the original parent thread. I started with 8 threads, then 4 threads, then 2 threads and finally 1 thread and the single thread was the fastest.
Quote from: hutch-- on June 22, 2019, 11:47:09 AM
You worry me at times. If you look at the reseed algo in the examples, it only uses the low 32 bit of rdtsc and then it does a sequence of rotates and xor to always produce a 64 bit integer. I will shortly post the final version but in between I wrote a multi-threaded version that was far slower than the single thread version because the thread overhead exceeded the runtime of the original parent thread. I started with 8 threads, then 4 threads, then 2 threads and finally 1 thread and the single thread was the fastest.
just curious how fast zero threads would run :file system call to check if random pad file already exist=exit without make random pad
this memory mapped file idea is too good to not try code SIMD :icon_idea:
:nie:
> this memory mapped file idea is too good to not try code SIMD
Feel free to code a legacy SSE random algo that tests well then work out how to seed it. The assumption that trying to do everything in SSE is not sound. Note that SSE and even the later AVX were primarily designed for high speed graphics, not high speed complex calculations. The bottom line is you must be able to demonstrate that the results are,
1. random
2. not easily reproducible.
Quote from: hutch-- on June 23, 2019, 02:42:58 PM
Feel free to code a legacy SSE random algo that tests well then work out how to seed it. The assumption that trying to do everything in SSE is not sound. Note that SSE and even the later AVX were primarily designed for high speed graphics, not high speed complex calculations. The bottom line is you must be able to demonstrate that the results are,
1. random
2. not easily reproducible.
thanks Hutch for inspire me to continue and improve unfinished SSE code
I am working with improve other things,its good to see I am not alone in trying to get SIMT habits into coding style