Author Topic: Re: X64 ABI, REAL8 passed in xmmreg?  (Read 2061 times)

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #15 on: April 11, 2021, 02:14:07 AM »
Coming soon - work in progress :biggrin:

 :thumbsup: Remember we need prototypes for share libraries. It's almost irrelevant for our own internal functions.

I know, and that's a problem. Not for my macros, actually, they do handle prototypes correctly. I guess I have to extract them from some C compiler's headers. Have you ever checked \Masm32\include\*.inc for the correctness of the PROTOs? :cool:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #16 on: April 11, 2021, 02:32:51 AM »
Hector,

Same old problem, you are confusing compilers and assemblers, most compilers are designed to use prototypes but not all. The Microsoft x64 ABI is based off C++ which is a compiler but again, MASM does not support prototypes as it is NOT a compiler. Now with the sense of prototypes that you have in mind, to be x64 ABI compliant, you would be using C++, not MASM and while that is a viable route, MASM simply does not support prototyping so it is a dead issue at best.

Remove C++ from your equasion and you can routinely write ABI compliant assembler code in MASM.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

HSE

  • Member
  • *****
  • Posts: 1741
  • <AMD>< 7-32>
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #17 on: April 11, 2021, 02:36:29 AM »
I guess I have to extract them from some C compiler's headers. Have you ever checked \Masm32\include\*.inc for the correctness of the PROTOs? :cool:
I think libraries don't care what assembler you use. If you have doubts, you can try others include sets like Nidud's, Biterider's, Yves', no need to make a new one.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #18 on: April 11, 2021, 03:32:20 AM »
 :biggrin:

I am not writing shared libraries, I only write MASM libraries and while others may be able to use them, they will have to cobble together their own prototypes because in case you have missed it, MASM does not use prototypes. The Watcom derivatives may want to end up as compilers but MASM does not need to do that, that is what 64 bit CL.EXE is for.

> If you have doubts, you can try others include sets like Nidud's, Biterider's, Yves', no need to make a new one.

Why bother wasting time when MASM does not need or use Watcom derivative's prototypes. I already have a near exhaustive set for the linker, why would I settle for the booby prize.  :tongue:
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

tenkey

  • Regular Member
  • *
  • Posts: 39
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #19 on: April 11, 2021, 08:11:22 AM »
Back to "REAL8 in xmm reg?"
If you are interfacing to MSVC C/C++ APIs (gcc has a different one), the answer is yes if it's one of the first four arguments of the C/C++ function definition or call. Win64 functions are MSVC types of functions.
I compiled a simple C program in VS2019 (Community) to find out what regs are used.
Code: [Select]
#include <stdio.h>
float sumf(float a, float b, float c, float d)
{
return a + b + c + d;
}

double sumd(double a, double b, double c, double d)
{
return a + b + c + d;
}

int main(int argc, char** argv)
{
float f = sumf(1.0, 2.0, 3.0, 4.0);
double d = sumd(1.0, 2.0, 3.0, 4.0);
printf("%d: float = %f, double = %f\n", 1, f, d);
}

Compiling in release mode with optimizations enabled produces:
Code: [Select]
; Function compile flags: /Ogtpy
; COMDAT sumf
_TEXT SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumf PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 4
addss xmm0, xmm1
addss xmm0, xmm2
addss xmm0, xmm3
; Line 5
ret 0
sumf ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; COMDAT sumd
_TEXT SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumd PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 9
addsd xmm0, xmm1
addsd xmm0, xmm2
addsd xmm0, xmm3
; Line 10
ret 0
sumd ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; COMDAT main
_TEXT SEGMENT
argc$ = 48
argv$ = 56
main PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 13
$LN8:
sub rsp, 40 ; 00000028H
; Line 16
movsd xmm2, QWORD PTR __real@4024000000000000
lea rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@
movaps xmm3, xmm2
movq r9, xmm2
movq r8, xmm2
mov edx, 1
call printf
; Line 17
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
main ENDP
_TEXT ENDS

sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,

sumd is the same, except that it uses REAL8 instructions.

Because it is in the same file and the functions are called with constants, the compiler has optimized away the calls to sumf and sumd. The main function is reduced to:
Code: [Select]
    printf("%d: float = %f, double = %f\n", 1, 10.0, 10.0);

The REAL8 10.0 is stored in CONST memory, and because printf has varargs, the 10.0 is loaded into xmm2, xmm3, r8, and r9.

The unoptimized debug version retains the function calls:
Code: [Select]
; Line 14
movss xmm3, DWORD PTR __real@40800000
movss xmm2, DWORD PTR __real@40400000
movss xmm1, DWORD PTR __real@40000000
movss xmm0, DWORD PTR __real@3f800000
call sumf
movss DWORD PTR f$[rbp], xmm0
; Line 15
movsd xmm3, QWORD PTR __real@4010000000000000
movsd xmm2, QWORD PTR __real@4008000000000000
movsd xmm1, QWORD PTR __real@4000000000000000
movsd xmm0, QWORD PTR __real@3ff0000000000000
call sumd
movsd QWORD PTR d$[rbp], xmm0

Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.

That's how MSVC works.

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #20 on: April 11, 2021, 09:54:40 AM »
Nice example, tenkey :thumbsup:

Here is my translation of your example (naked means no shuffling of passed regs into shadow space); to make it more complicated, I used mixed arguments:
Code: [Select]
include \Masm32\MasmBasic\Res\JBasic.inc ; ## builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
DefProc sumD proc !<naked!> argDouble0:REAL8, argDword1, argDouble2:REAL8, argDword3
  addsd xmm0, xmm2 ; REAL8 add in xmm regs
  movd rax, xmm0
  add rdx, r9 ; sum of two integer args
  cvtsi2sd xmm1, rdx ; to double
  addsd xmm0, xmm1 ; REAL8 add in xmm regs
  movd rax, xmm0 ; return REAL8 in xmm0 and rax
  ret
sumD endp
DefProc sumF proc !<naked!> argDouble0:REAL4, argDword1, argDouble2:REAL4, argDword3
  addss xmm0, xmm2 ; REAL4 add in xmm regs
  cvtss2sd xmm0, xmm0 ; REAL4 to REAL8
  add rdx, r9 ; sum of two integer args
  cvtsi2sd xmm1, rdx ; to double
  addsd xmm0, xmm1 ; REAL8 add in xmm regs
  movd rax, xmm0 ; return REAL8 in xmm0 and rax
  ret
sumF endp
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
;   int 3
  Print Str$("Result sum D: %f\n", rv(sumD, FP8(1000.11111), 100, FP8(10.0), 1))
  Print Str$("Result sum F: %f\n", rv(sumF, FP4(1000.11111), 100, FP4(10.0), 1))
EndOfCode ; OPT_Assembler  ML

Output:
Code: [Select]
This program was assembled with ml64 in 64-bit format.
Result sum D: 1111.111110
Result sum F: 1111.111084

One interesting aspect here: you can't feed REAL4 ("single") values to printf(), and you can't feed it directly with xmm regs. The CRT printf() wants a double in rax (or another register or REAL8 memory location). Therefore both sumD and sumF return doubles in xmm0 and rax.

I attach two executables, one with an int 3 right before the call to sumD. Run it from a DOS prompt, it won't wait for a keypress.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8491
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #21 on: April 11, 2021, 10:25:29 AM »
tenkey is correct,

sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,

The Microsoft x64 ABI specifies the first 4 XMM registers for arguments and the return value if any in XMM0. This is how you would interact with an external high level language that requires ABI calling convention comformity, with a 64 bit MASM app you can use the first 6 XMM registers and still remain ABI compliant in terms of register usage but it is not compatible with either the Win API or external procedure calls.

Also note that at least some CRT functions use the XMM registers and will over write any of the first 4 XMM registers.

This is why in a macro I posted above, you DO NOT MIX integer registers and XMM registers.

  ; -------------------------------------------------------------------
  ; with invoke, the integer args must come first. FP load must be last
  ; -------------------------------------------------------------------
    invoke LoadTest,1111,2222,3333,4444, xloadrv(1234.1234, 5678.5678)

It certainly can be done other ways but this tests up reliably.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #22 on: April 11, 2021, 10:41:16 AM »
Compiling in release mode with optimizations enabled produces:

Code: [Select]
; Line 16
movsd xmm2, QWORD PTR __real@4024000000000000
lea rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@
movaps xmm3, xmm2
movq r9, xmm2
movq r8, xmm2
mov edx, 1
call printf

Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.

That's how MSVC works.

The first four arguments are in rcx rdx r8 r9, but as you write correctly, they are untouched because sumd and sumf are not VARARG. However, printf() is a VARARG function:
Code: [Select]
float f = sumf(1.0, 2.0, 3.0, 4.0);
double d = sumd(1.0, 2.0, 3.0, 4.0);
printf("%d: float = %f, double = %f\n", 1, f, d);
Code: [Select]
rcx is the format string
rdx is the constant 1
r8 is f
r9 is d
xmm2 is a copy of f, d (the compiler seems to know that they are identical)
xmm3 is a copy of f, d

From my tests it seems that xmm2 and xmm3 are ignored by printf().

MSVC may behave differently if you add, in sumf and sumd, a random value to the sum, such as GetTickCount().

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #23 on: April 12, 2021, 12:03:48 AM »
One interesting aspect here: you can't feed REAL4 ("single") values to printf(), and you can't feed it directly with xmm regs. The CRT printf() wants a double in rax (or another register or REAL8 memory location). Therefore both sumD and sumF return doubles in xmm0 and rax.

 :biggrin:

Calling msvcrt.dll printf()

; build: asmc64 -pe test.asm

include stdio.inc

    .code

main proc

    movsd xmm0,1.0
    movsd xmm1,2.0

    printf("xmm0: %f\nxmm1: %f\n", xmm0, xmm1)
    ret

main endp

    end main

Output:

xmm0: 1.000000
xmm1: 2.000000

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #24 on: April 12, 2021, 12:09:15 AM »
Code: [Select]
    movsd xmm0,1.0
    movsd xmm1,2.0

    printf("xmm0: %f\nxmm1: %f\n", xmm0, xmm1)
Output:

xmm0: 1.000000
xmm1: 2.000000


Post the executable :biggrin:

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #25 on: April 12, 2021, 02:15:19 AM »
Post the executable :biggrin:

Assembly is actually a low-level programming language believe it or not. It may be produced by a high-level programming language (such as C/C++/Asmc) but can also be written from scratch.

Since you can't share any source from the binary machine code produced by the assembler in this forum in any meaningful way you have to produce a disassembled source from the output that you can post here.

This may be achieved by adding a switch to the assembler to produce a listing file or using an object converter to produce a disassembly of the binary object.

objconv -fasm test.obj test.txt

Code: [Select]
; Disassembly of file: test.obj
; Sun Apr 11 17:52:24 2021
; Mode: 64 bits
; Syntax: MASM/ML64
; Instruction set: SSE2, x64
option dotname

public main

extern printf: near


_text   SEGMENT PARA 'CODE'                             ; section number 1

main    PROC
        sub     rsp, 40                                 ; 0000 _ 48: 83. EC, 28
        movsd   xmm0, qword ptr [F0000]                 ; 0004 _ F2: 0F 10. 05, 00000000(rel)
        movsd   xmm1, qword ptr [F0001]                 ; 000C _ F2: 0F 10. 0D, 00000000(rel)
        movq    r8, xmm1                                ; 0014 _ 66 49: 0F 7E. C8
        movq    rdx, xmm0                               ; 0019 _ 66 48: 0F 7E. C2
        lea     rcx, [DS0000]                           ; 001E _ 48: 8D. 0D, 00000000(rel)
        call    printf                                  ; 0025 _ E8, 00000000(rel)
        add     rsp, 40                                 ; 002A _ 48: 83. C4, 28
        ret                                             ; 002E _ C3
main    ENDP

_text   ENDS

_data   SEGMENT PARA 'DATA'                             ; section number 2

F0000   label qword
        dq 3FF0000000000000H                            ; 0000 _ 1.0

F0001   dq 4000000000000000H                            ; 0008 _ 2.0

DS0000  label byte
        db 78H, 6DH, 6DH, 30H, 3AH, 20H, 25H, 66H       ; 0010 _ xmm0: %f
        db 0AH, 78H, 6DH, 6DH, 31H, 3AH, 20H, 25H       ; 0018 _ .xmm1: %
        db 66H, 0AH, 00H                                ; 0020 _ f..

_data   ENDS

.drectve SEGMENT BYTE 'CONST'                           ; section number 3

        db 2DH, 64H, 65H, 66H, 61H, 75H, 6CH, 74H       ; 0000 _ -default
        db 6CH, 69H, 62H, 3AH, 6CH, 69H, 62H, 63H       ; 0008 _ lib:libc
        db 2EH, 6CH, 69H, 62H, 20H, 2DH, 65H, 6EH       ; 0010 _ .lib -en
        db 74H, 72H, 79H, 3AH, 6DH, 61H, 69H, 6EH       ; 0018 _ try:main
        db 20H                                          ; 0020 _ 

.drectve ENDS

END

However, according to the Win64 ABI the caller and calle are interlinked so in order to understand how arguments are passed and used you need to disassemble both of them. The calle finish the loading of the C-stack frame used by the ABI depending on the function type.

The Variable Argument list (VARARG) is a pure C construct so this means basic stack work, so the registers are not really used her in the same way as for other functions.

printf proc format:ptr sbyte, argptr:vararg
    ret
printf endp

        mov     qword ptr [rsp+20H], r9
        mov     qword ptr [rsp+18H], r8
        mov     qword ptr [rsp+10H], rdx
        mov     qword ptr [rsp+8H], rcx
        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        leave
        ret

As you see here all arguments are now in the C-stack frame and are loaded regardless if exist or used as according to the ABI.

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #26 on: April 12, 2021, 06:34:13 AM »
Assembly is actually a low-level programming language believe it or not

I believe you :tongue:

Code: [Select]
        movq    r8, xmm1                                ; 0014 _ 66 49: 0F 7E. C8
        movq    rdx, xmm0                               ; 0019 _ 66 48: 0F 7E. C2
        lea     rcx, [DS0000]                           ; 001E _ 48: 8D. 0D, 00000000(rel)
        call    printf                                  ; 0025 _ E8, 00000000(rel)

Thanks, that's why I asked you to post the exe :cool:

It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.
« Last Edit: April 12, 2021, 07:53:07 AM by jj2007 »

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #27 on: April 12, 2021, 08:14:42 AM »
Thanks, that's why I asked you to post the exe :cool:

Wouldn't be easier to just ask for the listing/disassembly instead, in general, as oppose to always depend on a debugger to read code. The build command was also added, and you could easily skipped the include file by prototype printf, hence the rant.

Quote
It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.

That is true for Linux.

In linux it's necessary to set rax register with how many xmm registers are being used with printf. Maybe a try in windows.

I assumed this to be a boolean value to indicate if XMM registers was actually used but this seems to be correct.
Code: [Select]
printf proc syscall format:ptr sbyte, argptr:vararg
    ret
printf endp

main proc
    movsd xmm0,1.0
    movsd xmm1,2.0
    printf("xmm0: %f\nxmm1: %f\n", xmm0, xmm1)
    ret
main endp

printf  PROC
        push    rbp
        mov     rbp, rsp
        leave
        ret
printf  ENDP

main    PROC
        movsd   xmm0, qword ptr [F0000]
        movsd   xmm1, qword ptr [F0001]
        lea     rdi, [DS0000]
        mov     eax, 1
        call    printf
        ret
main    ENDP

jj2007

  • Member
  • *****
  • Posts: 11550
  • Assembler is fun ;-)
    • MasmBasic
Re: Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #28 on: April 12, 2021, 08:17:40 AM »
Quote
It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.

That is true for Linux.

Ok, makes sense. I wonder how it works with mixed arguments, such printf("%f %f %f %f", xmm0, MyDouble1, xmm1, MyDouble2)

nidud

  • Member
  • *****
  • Posts: 2212
    • https://github.com/nidud/asmc
Re: X64 ABI, REAL8 passed in xmmreg?
« Reply #29 on: April 12, 2021, 08:04:36 PM »
Well, it's a mixed bag as it kicks in at the :VARARG position. All args before that is handled as "regular" args and the the rest is passed in registers.

In this case the first two is passed in SIMD registers and the last two in x64 regs.

foo proto :real4, :real8, :vararg

    foo(xmm4,xmm5,xmm6,xmm7)

        movq    r9, xmm7
        movq    r8, xmm6
        movsd   xmm1, xmm5
        movss   xmm0, xmm4
        call    foo