News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Saving xmm registers

Started by Gunther, October 12, 2012, 10:52:17 AM

Previous topic - Next topic

Gunther

That question isn't relevant for every application, but there could be situations where a large amount of multimedia registers are in use. MS says in the x64 Software Conventions for VC that xmm0 - xmm3 are volatile; the other registers xmm4 - xmm15 must be preserved as needed by caller. http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

Has anyone a clue how to do that in a reasonable manner?

Gunther
You have to know the facts before you can distort them.

qWord

Save them like GPRs on the (aligned) stack using MOVDQA.  jWasm can also do this with the USES statement:
foo PROC uses xmm5 xmm6 ...
MREAL macros - when you need floating point arithmetic while assembling!

Gunther

Thank you qWord for the fast reply. Do you have a small code example?

Gunther
You have to know the facts before you can distort them.

qWord

hi,
not sure what example you expect, but maybe this helps:
option casemap :none
option frame:auto
option procalign:16
option fieldalign:8

JWASM_STORE_REGISTER_ARGUMENTS  EQU 1
JWASM_STACK_SPACE_RESERVATION   EQU 2

option win64:JWASM_STACK_SPACE_RESERVATION ; see documentation


.code
; hand crafted
foo1:
    sub rsp,8+2*16 + 4*8    ; 8 = align 16
                            ; 2*16 = XMM registers
                            ; 4*8 = shadow space - only needed, if APIs are called
   
    movdqa OWORD ptr [rsp+4*8+00],xmm5
    movdqa OWORD ptr [rsp+4*8+16],xmm6
   
    ; ...
   
    movdqa xmm5,OWORD ptr [rsp+4*8+00]
    movdqa xmm6,OWORD ptr [rsp+4*8+16]
   
    add rsp,8+2*16+4*8
    ret

foo2 proc FRAME uses xmm5 xmm6 myArg:DWORD
LOCAL x:REAL8   
   
    ;...
   
    ret
   
foo2 endp


main proc FRAME
   
    call foo1

    invoke foo2,123

    ret
   
main endp
end main

Quotejwasm.exe  /win64 src.asm
polink /SUBSYSTEM:CONSOLE src.obj

Also take a look in the examples coming with jWasm.
MREAL macros - when you need floating point arithmetic while assembling!

Gunther

Hi qWord,

I've downloaded the jWasm package and I think your example will help. Thank you. You're one of the jWasm contributors?

Gunther
You have to know the facts before you can distort them.

qWord

Quote from: Gunther on October 12, 2012, 11:59:49 AM
You're one of the jWasm contributors?
yes, bug reports for jWasm and WinInc  :P
MREAL macros - when you need floating point arithmetic while assembling!

Gunther

Hi qWord,

Quote from: qWord on October 12, 2012, 12:16:55 PM
yes, bug reports for jWasm and WinInc  :P

I hope it's not much to do for tracking bugs. Jwasm is a well written tool and rock solid, so far as I know.

Gunther
You have to know the facts before you can distort them.

frktons

#7
Quote from: Gunther on October 12, 2012, 10:52:17 AM
That question isn't relevant for every application, but there could be situations where a large amount of multimedia registers are in use. MS says in the x64 Software Conventions for VC that xmm0 - xmm3 are volatile; the other registers xmm4 - xmm15 must be preserved as needed by caller. http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx

Has anyone a clue how to do that in a reasonable manner?

Gunther

Another simple way I can figure, but slower than the stack, is to push them
to RAM variables declared for that use:


.data?

align 16
xmm0_var DB 16 DUP (?)
xmm1_var DB 16 DUP (?)

....

.code

movdqa oword ptr xmm0_var, xmm0 ; push the xmm0 register
movdqa xmm1, oword ptr xmm0_var ; pop it to another xmm register

or a little bit faster:

movaps oword ptr xmm0_var, xmm0          ; push the xmm0 register
pshufd  xmm1, oword ptr xmm0_var, 0E4H ; pop it to another xmm register

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Gunther

Quote from: frktons on January 15, 2013, 11:20:01 PM
Another simple way I can figure, but slower than the stack, is to push them
to RAM variables declared for that use:


.data?

align 16
xmm0_var DB 16 DUP (?)
xmm1_var DB 16 DUP (?)

....

.code

movdqa oword ptr xmm0_var, xmm0 ; push the xmm0 register
movdqa xmm1, oword ptr xmm0_var ; pop it to another xmm register

or a little bit faster:

movaps oword ptr xmm0_var, xmm0, 0E4H ; push the xmm0 register
pshufd  xmm1, oword ptr xmm0_var, 0E4H ; pop it to another xmm register



yes, of course. Thank you for the answer, Frank.

Gunther
You have to know the facts before you can distort them.