The MASM Forum

64 bit assembler => 64 bit assembler. Conceptual Issues => Topic started by: Gunther on October 12, 2012, 10:52:17 AM

Title: Saving xmm registers
Post by: Gunther on October 12, 2012, 10:52:17 AM
That question isn't relevant for every application, but there could be situations where a large amount of multimedia registers are in use. MS says in the x64 Software Conventions for VC that xmm0 - xmm3 are volatile; the other registers xmm4 - xmm15 must be preserved as needed by caller. http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx (http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx)

Has anyone a clue how to do that in a reasonable manner?

Gunther
Title: Re: Saving xmm registers
Post by: qWord on October 12, 2012, 11:11:15 AM
Save them like GPRs on the (aligned) stack using MOVDQA.  jWasm can also do this with the USES statement:
foo PROC uses xmm5 xmm6 ...
Title: Re: Saving xmm registers
Post by: Gunther on October 12, 2012, 11:20:44 AM
Thank you qWord for the fast reply. Do you have a small code example?

Gunther
Title: Re: Saving xmm registers
Post by: qWord on October 12, 2012, 11:48:16 AM
hi,
not sure what example you expect, but maybe this helps:
option casemap :none
option frame:auto
option procalign:16
option fieldalign:8

JWASM_STORE_REGISTER_ARGUMENTS  EQU 1
JWASM_STACK_SPACE_RESERVATION   EQU 2

option win64:JWASM_STACK_SPACE_RESERVATION ; see documentation


.code
; hand crafted
foo1:
    sub rsp,8+2*16 + 4*8    ; 8 = align 16
                            ; 2*16 = XMM registers
                            ; 4*8 = shadow space - only needed, if APIs are called
   
    movdqa OWORD ptr [rsp+4*8+00],xmm5
    movdqa OWORD ptr [rsp+4*8+16],xmm6
   
    ; ...
   
    movdqa xmm5,OWORD ptr [rsp+4*8+00]
    movdqa xmm6,OWORD ptr [rsp+4*8+16]
   
    add rsp,8+2*16+4*8
    ret

foo2 proc FRAME uses xmm5 xmm6 myArg:DWORD
LOCAL x:REAL8   
   
    ;...
   
    ret
   
foo2 endp


main proc FRAME
   
    call foo1

    invoke foo2,123

    ret
   
main endp
end main

Quotejwasm.exe  /win64 src.asm
polink /SUBSYSTEM:CONSOLE src.obj

Also take a look in the examples coming with jWasm.
Title: Re: Saving xmm registers
Post by: Gunther on October 12, 2012, 11:59:49 AM
Hi qWord,

I've downloaded the jWasm package and I think your example will help. Thank you. You're one of the jWasm contributors?

Gunther
Title: Re: Saving xmm registers
Post by: qWord on October 12, 2012, 12:16:55 PM
Quote from: Gunther on October 12, 2012, 11:59:49 AM
You're one of the jWasm contributors?
yes, bug reports for jWasm and WinInc  :P
Title: Re: Saving xmm registers
Post by: Gunther on October 12, 2012, 11:24:27 PM
Hi qWord,

Quote from: qWord on October 12, 2012, 12:16:55 PM
yes, bug reports for jWasm and WinInc  :P

I hope it's not much to do for tracking bugs. Jwasm is a well written tool and rock solid, so far as I know.

Gunther
Title: Re: Saving xmm registers
Post by: frktons on January 15, 2013, 11:20:01 PM
Quote from: Gunther on October 12, 2012, 10:52:17 AM
That question isn't relevant for every application, but there could be situations where a large amount of multimedia registers are in use. MS says in the x64 Software Conventions for VC that xmm0 - xmm3 are volatile; the other registers xmm4 - xmm15 must be preserved as needed by caller. http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx (http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx)

Has anyone a clue how to do that in a reasonable manner?

Gunther

Another simple way I can figure, but slower than the stack, is to push them
to RAM variables declared for that use:


.data?

align 16
xmm0_var DB 16 DUP (?)
xmm1_var DB 16 DUP (?)

....

.code

movdqa oword ptr xmm0_var, xmm0 ; push the xmm0 register
movdqa xmm1, oword ptr xmm0_var ; pop it to another xmm register

or a little bit faster:

movaps oword ptr xmm0_var, xmm0          ; push the xmm0 register
pshufd  xmm1, oword ptr xmm0_var, 0E4H ; pop it to another xmm register

Title: Re: Saving xmm registers
Post by: Gunther on January 16, 2013, 08:10:30 AM
Quote from: frktons on January 15, 2013, 11:20:01 PM
Another simple way I can figure, but slower than the stack, is to push them
to RAM variables declared for that use:


.data?

align 16
xmm0_var DB 16 DUP (?)
xmm1_var DB 16 DUP (?)

....

.code

movdqa oword ptr xmm0_var, xmm0 ; push the xmm0 register
movdqa xmm1, oword ptr xmm0_var ; pop it to another xmm register

or a little bit faster:

movaps oword ptr xmm0_var, xmm0, 0E4H ; push the xmm0 register
pshufd  xmm1, oword ptr xmm0_var, 0E4H ; pop it to another xmm register



yes, of course. Thank you for the answer, Frank.

Gunther