News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

X64 ABI, REAL8 passed in xmmreg?

Started by jj2007, April 08, 2021, 11:23:32 AM

Previous topic - Next topic

jj2007

I am testing the Masm64 SDK (including the May 2020 update), and wanted to see how ML64 passes REAL8 (double) arguments in the XMM registers. Here is my code (it assembles fine, and even runs but obviously throws an error, because the arguments have no meaningful values):
include \masm32\include64\masm64rt.inc
include \masm32\include64\glu32.inc
includelib \masm32\lib64\glu32.lib

.code
txTitle db "Title", 0
txText db "Text", 0

entry_point proc
LOCAL rval :QWORD
LOCAL qobj:QWORD, innerRadius:REAL8, outerRadius:REAL8, slices:DWORD, loops:DWORD, startAngle:REAL8, sweepAngle:REAL8

; C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Include\gl\GLU.h
; void APIENTRY gluPartialDisk (
;     GLUquadric          *qobj,
;     GLdouble            innerRadius,
;     GLdouble            outerRadius,
;     GLint               slices,
;     GLint               loops,
;     GLdouble            startAngle,
;     GLdouble            sweepAngle);
  int 3   ; for testing
  invoke gluPartialDisk, addr qobj, innerRadius, outerRadius, slices, loops, startAngle, sweepAngle
  invoke MessageBox, 0, addr txText, addr txTitle, MB_OK
  invoke ExitProcess, 0
  ret
entry_point endp

end


However, when launching the exe (only 1536 bytes!) with the debugger, there are no xmm regs passed:
lea rcx,[rbp-78]           |
mov rdx,[rbp-80]           |
mov r8,[rbp-88]            |
mov r9d,[rbp-8C]           |
mov eax,[rbp-90]           |
mov [rsp+20],eax           |
mov rax,[rbp-98]           |
mov [rsp+28],rax           |
mov rax,[rbp-A0]           |
mov [rsp+30],rax           |
call [<&gluPartialDisk>]   |


What am I making wrong?

P.S.: Interesting that gluPartialDisk does lots of pushing:
000007FEEADFAF20  | 48:8BC4                      | mov rax,rsp                |
000007FEEADFAF23  | 48:8958 08                   | mov [rax+8],rbx            |
000007FEEADFAF27  | 48:8968 10                   | mov [rax+10],rbp           |
000007FEEADFAF2B  | 56                           | push rsi                   |
000007FEEADFAF2C  | 57                           | push rdi                   |
000007FEEADFAF2D  | 41:54                        | push r12                   |
000007FEEADFAF2F  | 41:55                        | push r13                   |
000007FEEADFAF31  | 41:56                        | push r14                   |

hutch--

Load an XMM register with a floating point REAL8 value with "movsd" then read whatever you need in the proc. The REAL8 value must be a memory operand. I use a macro "loadsd" but it just a wrapper around a data section entry and loading its address with LEA.

NOSTACKFRAME

XmmTest proc

    addsd xmm0, xmm1
    addsd xmm0, xmm2
    addsd xmm0, xmm3
    addsd xmm0, xmm4
    addsd xmm0, xmm5
    addsd xmm0, xmm6
    addsd xmm0, xmm7

    ret

XmmTest endp

STACKFRAME


HSE

Some macros64 modifications:
Code (procedure_call) Select
  ;; **************************
    ;; first 4 register arguments
    ;; **************************
      IFNB <a1>
        REGISTER a1,cl,cx,ecx,rcx,xmm0
      ENDIF

      IFNB <a2>
        REGISTER a2,dl,dx,edx,rdx,xmm1
      ENDIF

      IFNB <a3>
        REGISTER a3,r8b,r8w,r8d,r8,xmm2
      ENDIF

      IFNB <a4>
        REGISTER a4,r9b,r9w,r9d,r9,xmm3
      ENDIF


Code (REGISTER) Select

    REGISTER MACRO anum,breg,wreg,dreg,qreg,xreg

      ·······

     IF ssize GT 9                             ;; handle REAL4 PTR
        lead SUBSTR <anum>,1,9
        IFIDNI lead,<REAL4 PTR>
          mov qreg, anum
          cvtss2sd xreg, DWORD PTR anum
          goto elbl
        ENDIF
      ENDIF

      IF ssize GT 9                             ;; handle REAL8 PTR
        lead SUBSTR <anum>,1,9
        IFIDNI lead,<REAL8 PTR>
          mov qreg, anum
          movsd xreg, qword ptr anum
          goto elbl
        ENDIF
      ENDIF


Attached macros64B.inc
Equations in Assembly: SmplMath

hutch--

Hi Hector,

Sorry if I have been a bit slow, I have been ploughing through a mountain of work and got a bit snowed under. I like the idea in the first one, will have to have a play with it to make sure there are no unintended side effects but it looks like a good idea. I have not digested the second one yet.

I am inclined to keep 32 and 64 bit SSE2 separate

This is a test piece I was working on recently, so far it will sequence load as many 64 bit pseudo immediated as you point at it but I ran out of time to finish it off.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    xmm_reg_call MACRO pname, args:VARARG
      LOCAL buff, var
      buff equ <>
      cnt = 0
        for arg, <args>
          buff CATSTR <loadsd >,<xmm>,num2str(cnt),<, >,<arg>   ;; join all strings
          % buff                                                ;; output as string
          cnt = cnt + 1                                         ;; increment register counter
        ENDM
      call pname                                                ;; call the procedure
    ENDM

    xrv MACRO pname, args :VARARG
      xmm_reg_call pname, args
      EXITM < xmm0>
    ENDM

    xinvoke MACRO pname, args :VARARG
      xmm_reg_call pname, args
    ENDM

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL fvar :REAL8
    LOCAL tvar :REAL8
    LOCAL pbuf :QWORD
    LOCAL buff[64] :BYTE

    mov pbuf, ptr$(buff)
    movsd fvar, xrv(XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
    rcall fptoa,fvar,pbuf
    conout "  Result = ",pbuf,lf

    xinvoke XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
    movsd fvar, xmm0
    rcall fptoa,fvar,pbuf
    conout "  Result = ",pbuf,lf

    waitkey
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

XmmTest proc

    addsd xmm0, xmm1
    addsd xmm0, xmm2
    addsd xmm0, xmm3
    addsd xmm0, xmm4
    addsd xmm0, xmm5
    addsd xmm0, xmm6
    addsd xmm0, xmm7

    ret

XmmTest endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

HSE

Hi Hutch!

No hurry at all  :biggrin:

Line can be shorter using expansión instead of num2str:        buff CATSTR <loadsd >,<xmm>,%cnt,<, >,<arg>   ;; join all strings


num2str is good for:       %echo num2str(cnt) 
:thumbsup:
Equations in Assembly: SmplMath

hutch--

I have got the earlier macro to work on different data notations and it seems to be working OK. I got a notation to work in an invoke call if it is used as the last argument.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

  ; *************************************************************

    load_xmm_regs MACRO args:VARARG
      LOCAL buff,aattr
      buff equ <>
      cnt = 0

     ;; ---------------------------------------------------------

      for arg, <args>
        aattr = getattr(arg)                                    ;; get the argument attribute
        IF aattr EQ 36                                          ;; INT immediate, exit on error
          .err <   ---{ notation error, missing floating point }--->
            %echo ------------------------------------------------------------------
            %echo ERROR => arg : x86-64 does not support loading immediate integers
            %echo          in an XMM register. Use FP format value instead => arg.0
            %echo          Assembly is TERMINATED at this point
            %echo ------------------------------------------------------------------
          EXITM <>                                              ;; stop the macro here on error
          END                                                   ;; terminate the assembly
        ENDIF
        IF aattr EQ 0
          buff CATSTR <loadsd >,<xmm>,%cnt,<, >,<arg>           ;; pseudo immediate
          goto writeit
        ENDIF
        IF aattr EQ 98
          buff CATSTR <movsd >,<xmm>,%cnt,<, >,<arg>            ;; LOCAL var
          goto writeit
        ENDIF
        IF aattr EQ 42
          buff CATSTR <movsd >,<xmm>,%cnt,<, >,<arg>            ;; GLOBAL var
          goto writeit
        ENDIF
        IF aattr EQ 48
          buff CATSTR <movsd >,<xmm>,%cnt,<, >,<arg>            ;; XMM register
          goto writeit
        ENDIF
      :writeit
        % buff                                                  ;; output as string
        cnt = cnt + 1                                           ;; increment register counter
      ENDM

     ;; ---------------------------------------------------------

    ENDM

  ; *************************************************************

    xrv MACRO pname:REQ, args :VARARG                           ;; function form
      load_xmm_regs args
      call pname                                                ;; call the procedure
      EXITM < xmm0>
    ENDM

    xinvoke MACRO pname:REQ, args :VARARG                       ;; statement form
      load_xmm_regs args
      call pname                                                ;; call the procedure
    ENDM

    xload MACRO args:VARARG
      load_xmm_regs args                                        ;; load XMM regs only
    ENDM

    xloadrv MACRO args:VARARG
      load_xmm_regs args                                        ;; load XMM regs only
      EXITM <>                                                  ;; allow empty function form
    ENDM

  ; *************************************************************

    .data?
      gvar REAL8 ?                      ; global for testing

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL lvar :REAL8
    LOCAL tvar :REAL8
    LOCAL pbuf :QWORD
    LOCAL buff[64] :BYTE


    mov pbuf, ptr$(buff)
    movsd lvar, xrv(XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
    rcall fptoa,lvar,pbuf
    conout "  Result = ",pbuf,lf

    xinvoke XmmTest,1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0
    movsd lvar, xmm0
    rcall fptoa,lvar,pbuf
    conout "  Result = ",pbuf,lf,lf

  ; -------------------------------------------------------------------
  ; with invoke, the integer args must come first. FP load must be last
  ; -------------------------------------------------------------------
    invoke LoadTest,1111,2222,3333,4444, xloadrv(1234.1234, 5678.5678)

    waitkey
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

XmmTest proc

    addsd xmm0, xmm1
    addsd xmm0, xmm2
    addsd xmm0, xmm3
    addsd xmm0, xmm4
    addsd xmm0, xmm5
    addsd xmm0, xmm6
    addsd xmm0, xmm7

    ret

XmmTest endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

LoadTest proc

    USING r12,r13,r14,r15

    LOCAL var1 :REAL8
    LOCAL var2 :REAL8
    LOCAL ptxt :QWORD
    LOCAL text[64]:BYTE

    SaveRegs

    movsd var1, xmm0            ; copy to REAL8 first
    movsd var2, xmm1            ; something in fptoa over writes at least one XMM register

    mov ptxt, ptr$(text)

    mov r12, rcx
    mov r13, rdx
    mov r14, r8
    mov r15, r9

    conout "  Integer  ",str$(r12),lf
    conout "  Integer  ",str$(r13),lf
    conout "  Integer  ",str$(r14),lf
    conout "  Integer  ",str$(r15),lf,lf

    invoke fptoa,var1,ptxt
    conout "  SD Float ",ptxt,lf

    invoke fptoa,var2,ptxt
    conout "  SD Float ",ptxt,lf

    RestoreRegs

    ret

LoadTest endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

hutch--

I have split this topic where it shifted from the 64 bit MASM project into wider and different topics. It can be found in the 64 bit assembler forum for any who wish to contribute to that discussion.

This subforum is reserved for 64 bit MASM and I will move anything that is different.

daydreamer

Quote from: hutch-- on April 12, 2021, 06:21:56 AM
I have split this topic where it shifted from the 64 bit MASM project into wider and different topics. It can be found in the 64 bit assembler forum for any who wish to contribute to that discussion.
I am writing some SSE2 scalar math functions going 64 ABI if I use xmm0 as input register ,whats best standard return result register to use?
any need for a SSE2 math lib?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

HSE

Quote from: daydreamer on August 16, 2022, 09:49:12 PM
if I use xmm0 as input register
Quote from: x64 calling convention__m128 types, arrays, and strings are never passed by immediate value. Instead, a pointer is passed to memory allocated by the caller. Structs and unions of size 8, 16, 32, or 64 bits, and __m64 types, are passed as if they were integers of the same size. Structs or unions of other sizes are passed as a pointer to memory allocated by the caller. For these aggregate types passed as a pointer, including __m128, the caller-allocated temporary memory must be 16-byte aligned.

Quotewhats best standard return result register to use?
Quote from: x64 calling conventionA scalar return value that can fit into 64 bits, including the __m64 type, is returned through RAX. Non-scalar types including floats, doubles, and vector types such as __m128, __m128i, __m128d are returned in XMM0. The state of unused bits in the value returned in RAX or XMM0 is undefined.

:thumbsup: Good to read x64 ABI from time to time.
Equations in Assembly: SmplMath

hutch--

magnus,

If you are going to use the XMM registers, you would generally pass args in XMM registers rather than double conversions to and from XMM registers and for return values, convention says a single return should be in XMM0 but there is nothing to stop you from returning muliple values in more than 1 XMM register. Writing assembler is more flexible than a C++ compiler but you still must preserve registers about XMM7 and there are some C++ vector calls that use some of the lower XMM registers as well.

If you are going to interface with 64 bit C++, you must understand what register preservations are required.