I am testing the Masm64 SDK (including the May 2020 update (http://masm32.com/board/index.php?topic=8557.0)), and wanted to see how ML64 passes REAL8 (double) arguments in the XMM registers. Here is my code (it assembles fine, and even runs but obviously throws an error, because the arguments have no meaningful values):
include \masm32\include64\masm64rt.inc
include \masm32\include64\glu32.inc
includelib \masm32\lib64\glu32.lib
.code
txTitle db "Title", 0
txText db "Text", 0
entry_point proc
LOCAL rval :QWORD
LOCAL qobj:QWORD, innerRadius:REAL8, outerRadius:REAL8, slices:DWORD, loops:DWORD, startAngle:REAL8, sweepAngle:REAL8
; C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include\gl\GLU.h
; void APIENTRY gluPartialDisk (
; GLUquadric *qobj,
; GLdouble innerRadius,
; GLdouble outerRadius,
; GLint slices,
; GLint loops,
; GLdouble startAngle,
; GLdouble sweepAngle);
int 3 ; for testing
invoke gluPartialDisk, addr qobj, innerRadius, outerRadius, slices, loops, startAngle, sweepAngle
invoke MessageBox, 0, addr txText, addr txTitle, MB_OK
invoke ExitProcess, 0
ret
entry_point endp
end
However, when launching the exe (only 1536 bytes!) with the debugger, there are no xmm regs passed:
lea rcx,[rbp-78] |
mov rdx,[rbp-80] |
mov r8,[rbp-88] |
mov r9d,[rbp-8C] |
mov eax,[rbp-90] |
mov [rsp+20],eax |
mov rax,[rbp-98] |
mov [rsp+28],rax |
mov rax,[rbp-A0] |
mov [rsp+30],rax |
call [<&gluPartialDisk>] |
What am I making wrong?
P.S.: Interesting that gluPartialDisk does lots of pushing:
000007FEEADFAF20 | 48:8BC4 | mov rax,rsp |
000007FEEADFAF23 | 48:8958 08 | mov [rax+8],rbx |
000007FEEADFAF27 | 48:8968 10 | mov [rax+10],rbp |
000007FEEADFAF2B | 56 | push rsi |
000007FEEADFAF2C | 57 | push rdi |
000007FEEADFAF2D | 41:54 | push r12 |
000007FEEADFAF2F | 41:55 | push r13 |
000007FEEADFAF31 | 41:56 | push r14 |
Load an XMM register with a floating point REAL8 value with "movsd" then read whatever you need in the proc. The REAL8 value must be a memory operand. I use a macro "loadsd" but it just a wrapper around a data section entry and loading its address with LEA.
NOSTACKFRAME
XmmTest proc
addsd xmm0, xmm1
addsd xmm0, xmm2
addsd xmm0, xmm3
addsd xmm0, xmm4
addsd xmm0, xmm5
addsd xmm0, xmm6
addsd xmm0, xmm7
ret
XmmTest endp
STACKFRAME
Some macros64 modifications: ;; **************************
;; first 4 register arguments
;; **************************
IFNB <a1>
REGISTER a1,cl,cx,ecx,rcx,xmm0
ENDIF
IFNB <a2>
REGISTER a2,dl,dx,edx,rdx,xmm1
ENDIF
IFNB <a3>
REGISTER a3,r8b,r8w,r8d,r8,xmm2
ENDIF
IFNB <a4>
REGISTER a4,r9b,r9w,r9d,r9,xmm3
ENDIF
REGISTER MACRO anum,breg,wreg,dreg,qreg,xreg
·······
IF ssize GT 9 ;; handle REAL4 PTR
lead SUBSTR <anum>,1,9
IFIDNI lead,<REAL4 PTR>
mov qreg, anum
cvtss2sd xreg, DWORD PTR anum
goto elbl
ENDIF
ENDIF
IF ssize GT 9 ;; handle REAL8 PTR
lead SUBSTR <anum>,1,9
IFIDNI lead,<REAL8 PTR>
mov qreg, anum
movsd xreg, qword ptr anum
goto elbl
ENDIF
ENDIF
Attached macros64B.inc
Hi Hector,
Sorry if I have been a bit slow, I have been ploughing through a mountain of work and got a bit snowed under. I like the idea in the first one, will have to have a play with it to make sure there are no unintended side effects but it looks like a good idea. I have not digested the second one yet.
I am inclined to keep 32 and 64 bit SSE2 separate
This is a test piece I was working on recently, so far it will sequence load as many 64 bit pseudo immediated as you point at it but I ran out of time to finish it off.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
xmm_reg_call MACRO pname, args:VARARG
LOCAL buff, var
buff equ <>
cnt = 0
for arg, <args>
buff CATSTR <loadsd >,<xmm>,num2str(cnt),<, >,<arg> ;; join all strings
% buff ;; output as string
cnt = cnt + 1 ;; increment register counter
ENDM
call pname ;; call the procedure
ENDM
xrv MACRO pname, args :VARARG
xmm_reg_call pname, args
EXITM < xmm0>
ENDM
xinvoke MACRO pname, args :VARARG
xmm_reg_call pname, args
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL fvar :REAL8
LOCAL tvar :REAL8
LOCAL pbuf :QWORD
LOCAL buff[64] :BYTE
mov pbuf, ptr$(buff)
movsd fvar, xrv(XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
rcall fptoa,fvar,pbuf
conout " Result = ",pbuf,lf
xinvoke XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
movsd fvar, xmm0
rcall fptoa,fvar,pbuf
conout " Result = ",pbuf,lf
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
XmmTest proc
addsd xmm0, xmm1
addsd xmm0, xmm2
addsd xmm0, xmm3
addsd xmm0, xmm4
addsd xmm0, xmm5
addsd xmm0, xmm6
addsd xmm0, xmm7
ret
XmmTest endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
Hi Hutch!
No hurry at all :biggrin:
Line can be shorter using expansión instead of num2str: buff CATSTR <loadsd >,<xmm>,%cnt,<, >,<arg> ;; join all strings
num2str is good for: %echo num2str(cnt)
:thumbsup:
I have got the earlier macro to work on different data notations and it seems to be working OK. I got a notation to work in an invoke call if it is used as the last argument.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
; *************************************************************
load_xmm_regs MACRO args:VARARG
LOCAL buff,aattr
buff equ <>
cnt = 0
;; ---------------------------------------------------------
for arg, <args>
aattr = getattr(arg) ;; get the argument attribute
IF aattr EQ 36 ;; INT immediate, exit on error
.err < ---{ notation error, missing floating point }--->
%echo ------------------------------------------------------------------
%echo ERROR => arg : x86-64 does not support loading immediate integers
%echo in an XMM register. Use FP format value instead => arg.0
%echo Assembly is TERMINATED at this point
%echo ------------------------------------------------------------------
EXITM <> ;; stop the macro here on error
END ;; terminate the assembly
ENDIF
IF aattr EQ 0
buff CATSTR <loadsd >,<xmm>,%cnt,<, >,<arg> ;; pseudo immediate
goto writeit
ENDIF
IF aattr EQ 98
buff CATSTR <movsd >,<xmm>,%cnt,<, >,<arg> ;; LOCAL var
goto writeit
ENDIF
IF aattr EQ 42
buff CATSTR <movsd >,<xmm>,%cnt,<, >,<arg> ;; GLOBAL var
goto writeit
ENDIF
IF aattr EQ 48
buff CATSTR <movsd >,<xmm>,%cnt,<, >,<arg> ;; XMM register
goto writeit
ENDIF
:writeit
% buff ;; output as string
cnt = cnt + 1 ;; increment register counter
ENDM
;; ---------------------------------------------------------
ENDM
; *************************************************************
xrv MACRO pname:REQ, args :VARARG ;; function form
load_xmm_regs args
call pname ;; call the procedure
EXITM < xmm0>
ENDM
xinvoke MACRO pname:REQ, args :VARARG ;; statement form
load_xmm_regs args
call pname ;; call the procedure
ENDM
xload MACRO args:VARARG
load_xmm_regs args ;; load XMM regs only
ENDM
xloadrv MACRO args:VARARG
load_xmm_regs args ;; load XMM regs only
EXITM <> ;; allow empty function form
ENDM
; *************************************************************
.data?
gvar REAL8 ? ; global for testing
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
LOCAL lvar :REAL8
LOCAL tvar :REAL8
LOCAL pbuf :QWORD
LOCAL buff[64] :BYTE
mov pbuf, ptr$(buff)
movsd lvar, xrv(XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
rcall fptoa,lvar,pbuf
conout " Result = ",pbuf,lf
xinvoke XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
movsd lvar, xmm0
rcall fptoa,lvar,pbuf
conout " Result = ",pbuf,lf,lf
; -------------------------------------------------------------------
; with invoke, the integer args must come first. FP load must be last
; -------------------------------------------------------------------
invoke LoadTest,1111,2222,3333,4444, xloadrv(1234.1234, 5678.5678)
waitkey
.exit
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
XmmTest proc
addsd xmm0, xmm1
addsd xmm0, xmm2
addsd xmm0, xmm3
addsd xmm0, xmm4
addsd xmm0, xmm5
addsd xmm0, xmm6
addsd xmm0, xmm7
ret
XmmTest endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
LoadTest proc
USING r12,r13,r14,r15
LOCAL var1 :REAL8
LOCAL var2 :REAL8
LOCAL ptxt :QWORD
LOCAL text[64]:BYTE
SaveRegs
movsd var1, xmm0 ; copy to REAL8 first
movsd var2, xmm1 ; something in fptoa over writes at least one XMM register
mov ptxt, ptr$(text)
mov r12, rcx
mov r13, rdx
mov r14, r8
mov r15, r9
conout " Integer ",str$(r12),lf
conout " Integer ",str$(r13),lf
conout " Integer ",str$(r14),lf
conout " Integer ",str$(r15),lf,lf
invoke fptoa,var1,ptxt
conout " SD Float ",ptxt,lf
invoke fptoa,var2,ptxt
conout " SD Float ",ptxt,lf
RestoreRegs
ret
LoadTest endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
I have split this topic where it shifted from the 64 bit MASM project into wider and different topics. It can be found in the 64 bit assembler forum for any who wish to contribute to that discussion.
This subforum is reserved for 64 bit MASM and I will move anything that is different.
Quote from: hutch-- on April 12, 2021, 06:21:56 AM
I have split this topic where it shifted from the 64 bit MASM project into wider and different topics. It can be found in the 64 bit assembler forum for any who wish to contribute to that discussion.
I am writing some SSE2 scalar math functions going 64 ABI if I use xmm0 as input register ,whats best standard return result register to use?
any need for a SSE2 math lib?
Quote from: daydreamer on August 16, 2022, 09:49:12 PM
if I use xmm0 as input register
Quote from: x64 calling convention__m128 types, arrays, and strings are never passed by immediate value. Instead, a pointer is passed to memory allocated by the caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size. Structs or unions of other sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer, including __m128, the caller-allocated temporary memory must be 16-byte aligned.
Quotewhats best standard return result register to use?
Quote from: x64 calling conventionA scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. The state of unused bits in the value returned in RAX or XMM0 is undefined.
:thumbsup: Good to read x64 ABI from time to time.
magnus,
If you are going to use the XMM registers, you would generally pass args in XMM registers rather than double conversions to and from XMM registers and for return values, convention says a single return should be in XMM0 but there is nothing to stop you from returning muliple values in more than 1 XMM register. Writing assembler is more flexible than a C++ compiler but you still must preserve registers about XMM7 and there are some C++ vector calls that use some of the lower XMM registers as well.
If you are going to interface with 64 bit C++, you must understand what register preservations are required.