News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Memory mapped random generator

Started by hutch--, June 18, 2019, 04:49:07 AM

Previous topic - Next topic

hutch--

The problem had been with the duration of saving the file output so I set up a method that uses memory mapped files where you write directly to the open mapped memory and it just about excludes the file save time which was far longer than the random generation. The random algorithm used is the library "irand" with some help from the "rrand" range version to vary the interval length of reseeding the main algo. This version has a very large capacity and has been tested up to 48 gigabytes running on my middle aged Haswell and a bit over 1 gig a second.

The app reseeds the irand algo at random intervals within a range which provides at least some brute force protection but the random pad could still theoretically be attacked by someone who spent long enough running brute force attacks with the QWORD range and get partial decryptions. What needs to be done is a multi thread or multipass version that properly protects the pad from brute force. This can be done by either the same irand algo or using a multi thread rdrand pad xorred against the first pad.

Just as an aside, the test piece that Jose did with the 64 threads was possible because the rdrand mnemonic is very slow and does not fully utilise the thread. The library "irand" algo and similar are much faster that rdrand and they saturate the thread they are contained in. It is still a useful technique because of its different random source which can be combined with a seeded algo.

The attached file is set to a low 1 gig as most 64 bit machines can run it. This much I have found with using  a memory mapped file for output is that it must not exceed available memory.

aw27

Quote from: hutch-- on June 18, 2019, 04:49:07 AM
Just as an aside, the test piece that Jose did with the 64 threads was possible because the rdrand mnemonic is very slow and does not fully utilise the thread. The library "irand" algo and similar are much faster that rdrand and they saturate the thread they are contained in. It is still a useful technique because of its different random source which can be combined with a seeded algo.

Exactly, the need for multithreading implies some degree of implicit slowness.


daydreamer

#2
Lol AW suggests the threads are a lot of underdogs?

Hutch great work 7 times faster than the filewrite one 922ms,mine is a i5,that runs 2.5ghz+
maybe should try a different test that spinsup cpus to max and after that run random pad program?
,one of advantages with 64bit is you can run many threads not limited by 32bit memory alloc,if you run memory intensive cg,with each. Thread
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

aw27

Quote from: daydreamer on June 18, 2019, 03:49:11 PM
Lol AW suggests the threads are a lot of underdogs?

I did not invent it:
https://9gag.com/gag/aWYXDwd/multithreading

hutch--

Magnus,

I am reasonably happy with the mapped file technique as it presents a writable block of memory that you can write directly to without the writefile technique and this saves a reasonable amount of time. This changes the internal logic in that instead of allocating memory and processing it, you have a linear address range that you write to which normally requires one pass to write it. The last example used multi seeding of random lengths but to properly protect the random pad it needs more than one pass to break attempts at brute forcing.

hutch--

Here is the next version, it solves the problem of being vulnerable to brute force attack by doing a second xor pass which makes reconstructing the pad almost impossible. Source is attached. Something to note, if you set the pad size in excess of installed memory, it will stop the OS stone dead so if you modify the pad size, stay under your installed memory. The extra pass incurred a longer run time but it is still reasonably fast, it moved from about 900 ms on a 1 gig pad to about 1350 ms with the extra pass. I will try a multi-threaded version to see if I can reduce the timing.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    USING rsi,r12,r13,r14,r15
    LOCAL hMMF  :QWORD                          ; memory mapped file handle
    LOCAL pMMF  :QWORD                          ; pointer to mapped memory
    LOCAL hFile :QWORD                          ; save file handle
    LOCAL block :QWORD                          ; loop counter

    SaveRegs

    HighPriority

    mov r13, 1024*1024*1024*1                   ; the requested pad size
    mov r14, r13                                ; preserve original length in r14

    conout "Creating random pad",lf

    mov hFile, flcreate("random.pad")           ; create the open file
    invoke OpenMapFile,hFile,"mapfile", \
                  r13,ADDR hMMF,ADDR pMMF       ; open the file mapping

    mov r15, rv(GetTickCount)                   ; start timing

  ; *********************************************
  ;              write data pass
  ; *********************************************

    shr r13, 3                                  ; set block size from request size

    call reseed                                 ; get a seed for the random algo
    rcall seed_irand,rax
    mov r12, pMMF                               ; mapped memory address into r12

    mov rsi, 1                                  ; set the initial reseeding count

    call reseed                                 ; get a seed for the range random algo
    rcall seed_rrand,rax                        ; reseed the range random algo
  @@:
    rcall irand                                 ; call random algo
    mov QWORD PTR [r12], rax
    add r12, 8                                  ; set next write location
    sub rsi, 1                                  ; decrement the interval counter
    jnz nxt
  ; ----------------------
    rcall rrand, 11,1663                        ; call range random for reseeding interval
    mov rsi, rax
    call reseed                                 ; get a seed for the random algo
    rcall seed_irand,rax                        ; reseed the integer random algo
  ; ----------------------
  nxt:
    sub r13, 1                                  ; decrement loop counter
    jnz @B

  ; *********************************************
  ;               xor data pass
  ; *********************************************

    rcall rrand, 15,60
    rcall SleepEx,rax,0

    call reseed                                 ; get a seed for the random algo
    rcall seed_irand,rax
    mov r12, pMMF                               ; mapped memory address into r12

    mov rsi, 3                                  ; set the initial reseeding count

    call reseed                                 ; get a seed for the range random algo
    rcall seed_rrand,rax                        ; reseed the range random algo

    mov r13, r14
    shr r13, 3

  @@:
    rcall irand                                 ; call random algo
    xor QWORD PTR [r12], rax
    add r12, 8                                  ; set next write location
    sub rsi, 1                                  ; decrement the interval counter
    jnz nxt1
  ; ----------------------
    rcall rrand, 17,4923                        ; call range random for reseeding interval
    mov rsi, rax
    call reseed                                 ; get a seed for the random algo
    rcall seed_irand,rax                        ; reseed the integer random algo
  ; ----------------------
  nxt1:
    sub r13, 1                                  ; decrement loop counter
    jnz @B

  ; *********************************************

    rcall GetTickCount                          ; end timing
    sub rax, r15
    conout "Timing = ",str$(rax)," ms",lf

    conout "File written to disk at ",str$(r14)," bytes",lf

    rcall CloseMMF,pMMF,hMMF                    ; close the file mapping
    rcall CloseHandle,hFile                     ; close the file

    NormalPriority

    waitkey
    RestoreRegs
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OpenMapFile proc hFile:QWORD,lpName:QWORD,bcnt:QWORD,pHandle:QWORD,pMemory:QWORD

    LOCAL hi32  :DWORD
    LOCAL lo32  :DWORD

    mov lo32, r8d                               ; split hi and lo DWORDS of bcnt
    rol r8, 32
    mov hi32, r8d

    invoke CreateFileMapping, \
           hFile, \                             ; open file handle
           NULL, \
           PAGE_READWRITE, \                    ; read write access to memory
           hi32, \                              ; high DWORD of bcnt
           lo32, \                              ; low DWORD of bcnt
           lpName                               ; set file object name here

    mov rdx, rax                                ; MMF handle in RDX
    mov rcx, pHandle                            ; address of variable in RCX
    mov [rcx], rax                              ; write MMF handle to variable

    invoke MapViewOfFile, \
           rdx, FILE_MAP_WRITE,0,0,0
    mov rcx, pMemory                            ; address of variable in RCX
    mov [rcx], rax                              ; write start address to variable

    ret

OpenMapFile endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

reseed proc

    mov rax, 100                                ; out of range number
    cpuid                                       ; set all regs to 0
    rdtsc                                       ; date time counter
    pause                                       ; spinlock pause
    bswap rax                                   ; reverse byte order
    mov r11, rax                                ; store rax in r11

    mov rcx, 10                                 ; loop count
  @@:
    rdtsc                                       ; date time counter
    pause                                       ; spinlock pause
    bswap rax                                   ; reverse byte order
    rol rax, 7                                  ; rotate left by prime
    xor r11, rax                                ; xor rax to r11
    rol r11, 5                                  ; rotate left by prime
    sub rcx, 1                                  ; decrement counter

    jnz @B
    mov rax, r11                                ; return the value in r11
    ret

reseed endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

daydreamer

what about you unroll it so you can use vxorps 256bit ?
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

 :skrewy:

To do what ? Have a look at the xor pass, it does almost the same as the first pass except that it xors the result to the same offset as the first pass. Now while writing in larger blocks makes sense, to take QWORD sized random numbers and load them into a larger data size, you incur the overhead of the loading which is still a number of random generator procedure calls and then the loading overhead.  :winking:

aw27

CUDA has a vast random number generation library, called CURAND. However, I don't know if it feasible to use from ASM, because it does not appear to support the driver API (only the runtime API). Something I will check sometime in the future.

daydreamer

#9
Hutch maybe you should try multithreaded
I lack rols so I need todo workarounds and ideas for seeds after PI,sqrt(2) to start with 128bit+,maybe use very big prime numbers
adding rdtsc value
got stucked,upper 64bit doesnt work
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

Magnus,

You worry me at times. If you look at the reseed algo in the examples, it only uses the low 32 bit of rdtsc and then it does a sequence of rotates and xor to always produce a 64 bit integer. I will shortly post the final version but in between I wrote a multi-threaded version that was far slower than the single thread version because the thread overhead exceeded the runtime of the original parent thread. I started with 8 threads, then 4 threads, then 2 threads and finally 1 thread and the single thread was the fastest.

daydreamer

Quote from: hutch-- on June 22, 2019, 11:47:09 AM

You worry me at times. If you look at the reseed algo in the examples, it only uses the low 32 bit of rdtsc and then it does a sequence of rotates and xor to always produce a 64 bit integer. I will shortly post the final version but in between I wrote a multi-threaded version that was far slower than the single thread version because the thread overhead exceeded the runtime of the original parent thread. I started with 8 threads, then 4 threads, then 2 threads and finally 1 thread and the single thread was the fastest.
just curious how fast zero threads would run :file system call to check if random pad file already exist=exit without make random pad

this memory mapped file idea is too good to not try code SIMD :icon_idea:
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

 :nie:

> this memory mapped file idea is too good to not try code SIMD

Feel free to code a legacy SSE random algo that tests well then work out how to seed it. The assumption that trying to do everything in SSE is not sound. Note that SSE and even the later AVX were primarily designed for high speed graphics, not high speed complex calculations. The bottom line is you must be able to demonstrate that the results are,

1. random
2. not easily reproducible.

daydreamer

Quote from: hutch-- on June 23, 2019, 02:42:58 PM
Feel free to code a legacy SSE random algo that tests well then work out how to seed it. The assumption that trying to do everything in SSE is not sound. Note that SSE and even the later AVX were primarily designed for high speed graphics, not high speed complex calculations. The bottom line is you must be able to demonstrate that the results are,

1. random
2. not easily reproducible.
thanks Hutch for inspire me to continue and improve unfinished SSE code

I am working with improve other things,its good to see I am not alone in trying to get SIMT habits into coding style





my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding