Author Topic: Save MMX, SSE, AVX registers to memory  (Read 495 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7541
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Save MMX, SSE, AVX registers to memory
« Reply #15 on: July 27, 2020, 03:06:49 PM »
Here is a quick toy before I have to go back to configuring Win10 on the new box.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

 entry_point proc

    USING r12
    LOCAL psrc  :QWORD
    LOCAL pdst  :QWORD
    LOCAL asrc  :QWORD
    LOCAL adst  :QWORD
    LOCAL tcnt  :QWORD

    SaveRegs

    memsize equ <1024*1024*1024*4>

    HighPriority

    mov psrc, alloc(memsize+1024)                       ; 4 gig  + 1k
    alignup rax, 512                                    ; align the memory
    mov asrc, rax                                       ; save address in ptr

    conout "  ptr aligned src ",str$(asrc),lf                                ; display address

    mov pdst, alloc(memsize+1024)
    alignup rax, 512
    mov adst, rax

    conout "  ptr aligned dst ",str$(adst),lf

    rcall GetTickCount
    mov r12, rax

  ; |||||||||||||||||||||||||||||||||||||||||

    rcall aligned_data_copy,asrc,adst,memsize           ; call block copy proc

  ; |||||||||||||||||||||||||||||||||||||||||

    rcall GetTickCount
    sub rax, r12
    mov r12, rax

    conout "  -----------------------",lf
    conout "   4 gig copy in ",str$(r12)," ms",lf       ; show milliseconds
    conout "  -----------------------",lf,lf

    mfree psrc                                          ; free memory
    mfree pdst

    NormalPriority

    waitkey
    RestoreRegs
    .exit

 entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

  YMMSTACK

aligned_data_copy proc ;;;; src:QWORD,dst:QWORD,bcnt:QWORD

    shr r8, 8                                           ; div by 256

  @@:
    vmovdqa ymm0, YMMWORD PTR [rcx]
    vmovdqa ymm1, YMMWORD PTR [rcx+32]
    vmovdqa ymm2, YMMWORD PTR [rcx+64]
    vmovdqa ymm3, YMMWORD PTR [rcx+96]

    vmovdqa ymm4, YMMWORD PTR [rcx+128]
    vmovdqa ymm5, YMMWORD PTR [rcx+160]
    vmovdqa ymm6, YMMWORD PTR [rcx+192]
    vmovdqa ymm7, YMMWORD PTR [rcx+224]

    vmovdqa YMMWORD PTR [rdx], ymm0
    vmovdqa YMMWORD PTR [rdx+32], ymm1
    vmovdqa YMMWORD PTR [rdx+64], ymm2
    vmovdqa YMMWORD PTR [rdx+96], ymm3

    vmovdqa YMMWORD PTR [rdx+128], ymm4
    vmovdqa YMMWORD PTR [rdx+160], ymm5
    vmovdqa YMMWORD PTR [rdx+192], ymm6
    vmovdqa YMMWORD PTR [rdx+224], ymm7

    add rcx, 256
    add rdx, 256

    sub r8, 1
    jnz @B

    ret

aligned_data_copy endp

  STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

Result on my Haswell.

  ptr aligned src 5368799744
  ptr aligned dst 9663869440
  -----------------------
   4 gig copy in 2125 ms
  -----------------------

Press any key to continue...

hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7541
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Save MMX, SSE, AVX registers to memory
« Reply #16 on: July 27, 2020, 06:08:05 PM »
As I expected, the unroll did not make it any faster but a different instruction choice did.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

  YMMSTACK

aligned_data_copy proc

  ; src = rcx
  ; dst = rdx
  ; cnt = r8

    shr r8, 5                           ; div by 32

  @@:
    vmovntdqa ymm0, YMMWORD PTR [rcx]
    vmovntdq YMMWORD PTR [rdx], ymm0
    add rcx, 32
    add rdx, 32
    sub r8, 1
    jnz @B

    ret

aligned_data_copy endp

  STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 10544
  • Assembler is fun ;-)
    • MasmBasic
Re: Save MMX, SSE, AVX registers to memory
« Reply #17 on: July 27, 2020, 06:50:16 PM »
A propos: has anybody tried xsave/xrstor?

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: Save MMX, SSE, AVX registers to memory
« Reply #18 on: July 27, 2020, 11:35:08 PM »
Code: [Select]
ifdef _WIN64
  cax equ <rax>
  cbx equ <rbx>
  ccx equ <rcx>
  cdx equ <rdx>
  csi equ <rsi>
  cdi equ <rdi>
  csp equ <rsp>
  cbp equ <rbp>
else
  cax equ <eax>
  cbx equ <ebx>
  ccx equ <ecx>
  cdx equ <edx>
  csi equ <esi>
  cdi equ <edi>
  csp equ <esp>
  cbp equ <ebp>
endif

You may consider this for the registers.

ifndef _WIN64
  rax equ <eax>
  rbx equ <ebx>
  rcx equ <ecx>
  rdx equ <edx>
  rsi equ <esi>
  rdi equ <edi>
  rsp equ <esp>
  rbp equ <ebp>
endif

As for saving AVX registers I'm not sure about Uasm but Asmc auto saves registers providing their used. This apply to vectors used as arguments however, and only VECTORCALL will size up the stack correctly according to vector size used.

The caller provide the stack (in this case 6 * 32):

foo proto vectorcall :yword, :yword, :yword, :yword, :yword, :yword

bar proc

    foo(ymm0, ymm1, ymm2, ymm3, ymm4, ymm5)
    ret

bar endp

bar     PROC
        sub     rsp, 200
        call    foo@@192
        add     rsp, 200
        ret           
bar     ENDP


The entry point for foo is then a 192 byte param stack for 6 YWORD's starting at [rsp+8]. The assembler, based on the command line switch /homeparams or OPTION win64 and whether the parameter was used, will save this vector to the stack.

foo proc vectorcall a:yword, b:yword, c:yword, d:yword, e:yword, f:yword

    vmovups ymm3,a
    vmovups ymm4,b
    ret

foo endp

foo@@192 PROC
        vmovups ymmword ptr [rsp+28H], ymm1
        vmovups ymmword ptr [rsp+8H], ymm0
        push    rbp           
        mov     rbp, rsp       
        sub     rsp, 96       
        vmovups ymm3, ymmword ptr [rbp+10H]
        vmovups ymm4, ymmword ptr [rbp+30H]
        leave                             
        ret
foo@@192 ENDP

JK

  • Regular Member
  • *
  • Posts: 46
Re: Save MMX, SSE, AVX registers to memory
« Reply #19 on: July 28, 2020, 09:07:48 PM »
Thanks for all your input!

Quote
You may consider this for the registers.

Sometimes i want to have code, which can be assembled for 32 and 64 bit. And i want to be able to see at first glance, that this is such code. "RAX" obviously must be 64 bit. "EAX" could be both (32 and 64 bit) or 32 bit only - it´s ambiguous in this respect. If i name it "CAX" (as i did) i can tell at once, it is common code (C for common) - just a personal preference.


JK

HSE

  • Member
  • *****
  • Posts: 1379
  • <AMD>< 7-32>
Re: Save MMX, SSE, AVX registers to memory
« Reply #20 on: July 29, 2020, 12:30:33 AM »
In ObjAsm , a dual 32/64 framework, is used xax, xcx, xsi, xdi, etc