News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Re: X64 ABI, REAL8 passed in xmmreg?

Started by jj2007, April 09, 2021, 11:23:41 PM

Previous topic - Next topic

jj2007

Quote from: HSE on April 11, 2021, 01:50:15 AM
Quote from: jj2007 on April 11, 2021, 01:32:55 AM
Coming soon - work in progress :biggrin:

:thumbsup: Remember we need prototypes for share libraries. It's almost irrelevant for our own internal functions.

I know, and that's a problem. Not for my macros, actually, they do handle prototypes correctly. I guess I have to extract them from some C compiler's headers. Have you ever checked \Masm32\include\*.inc for the correctness of the PROTOs? :cool:

hutch--

Hector,

Same old problem, you are confusing compilers and assemblers, most compilers are designed to use prototypes but not all. The Microsoft x64 ABI is based off C++ which is a compiler but again, MASM does not support prototypes as it is NOT a compiler. Now with the sense of prototypes that you have in mind, to be x64 ABI compliant, you would be using C++, not MASM and while that is a viable route, MASM simply does not support prototyping so it is a dead issue at best.

Remove C++ from your equasion and you can routinely write ABI compliant assembler code in MASM.

HSE

Quote from: jj2007 on April 11, 2021, 02:14:07 AM
I guess I have to extract them from some C compiler's headers. Have you ever checked \Masm32\include\*.inc for the correctness of the PROTOs? :cool:
I think libraries don't care what assembler you use. If you have doubts, you can try others include sets like Nidud's, Biterider's, Yves', no need to make a new one.
Equations in Assembly: SmplMath

hutch--

 :biggrin:

I am not writing shared libraries, I only write MASM libraries and while others may be able to use them, they will have to cobble together their own prototypes because in case you have missed it, MASM does not use prototypes. The Watcom derivatives may want to end up as compilers but MASM does not need to do that, that is what 64 bit CL.EXE is for.

> If you have doubts, you can try others include sets like Nidud's, Biterider's, Yves', no need to make a new one.

Why bother wasting time when MASM does not need or use Watcom derivative's prototypes. I already have a near exhaustive set for the linker, why would I settle for the booby prize.  :tongue:

tenkey

Back to "REAL8 in xmm reg?"
If you are interfacing to MSVC C/C++ APIs (gcc has a different one), the answer is yes if it's one of the first four arguments of the C/C++ function definition or call. Win64 functions are MSVC types of functions.
I compiled a simple C program in VS2019 (Community) to find out what regs are used.
#include <stdio.h>
float sumf(float a, float b, float c, float d)
{
return a + b + c + d;
}

double sumd(double a, double b, double c, double d)
{
return a + b + c + d;
}

int main(int argc, char** argv)
{
float f = sumf(1.0, 2.0, 3.0, 4.0);
double d = sumd(1.0, 2.0, 3.0, 4.0);
printf("%d: float = %f, double = %f\n", 1, f, d);
}


Compiling in release mode with optimizations enabled produces:
; Function compile flags: /Ogtpy
; COMDAT sumf
_TEXT SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumf PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 4
addss xmm0, xmm1
addss xmm0, xmm2
addss xmm0, xmm3
; Line 5
ret 0
sumf ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; COMDAT sumd
_TEXT SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumd PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 9
addsd xmm0, xmm1
addsd xmm0, xmm2
addsd xmm0, xmm3
; Line 10
ret 0
sumd ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; COMDAT main
_TEXT SEGMENT
argc$ = 48
argv$ = 56
main PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 13
$LN8:
sub rsp, 40 ; 00000028H
; Line 16
movsd xmm2, QWORD PTR __real@4024000000000000
lea rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@
movaps xmm3, xmm2
movq r9, xmm2
movq r8, xmm2
mov edx, 1
call printf
; Line 17
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
main ENDP
_TEXT ENDS


sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,

sumd is the same, except that it uses REAL8 instructions.

Because it is in the same file and the functions are called with constants, the compiler has optimized away the calls to sumf and sumd. The main function is reduced to:
    printf("%d: float = %f, double = %f\n", 1, 10.0, 10.0);


The REAL8 10.0 is stored in CONST memory, and because printf has varargs, the 10.0 is loaded into xmm2, xmm3, r8, and r9.

The unoptimized debug version retains the function calls:
; Line 14
movss xmm3, DWORD PTR __real@40800000
movss xmm2, DWORD PTR __real@40400000
movss xmm1, DWORD PTR __real@40000000
movss xmm0, DWORD PTR __real@3f800000
call sumf
movss DWORD PTR f$[rbp], xmm0
; Line 15
movsd xmm3, QWORD PTR __real@4010000000000000
movsd xmm2, QWORD PTR __real@4008000000000000
movsd xmm1, QWORD PTR __real@4000000000000000
movsd xmm0, QWORD PTR __real@3ff0000000000000
call sumd
movsd QWORD PTR d$[rbp], xmm0


Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.

That's how MSVC works.

jj2007

Nice example, tenkey :thumbsup:

Here is my translation of your example (naked means no shuffling of passed regs into shadow space); to make it more complicated, I used mixed arguments:
include \Masm32\MasmBasic\Res\JBasic.inc ; ## builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
DefProc sumD proc !<naked!> argDouble0:REAL8, argDword1, argDouble2:REAL8, argDword3
  addsd xmm0, xmm2 ; REAL8 add in xmm regs
  movd rax, xmm0
  add rdx, r9 ; sum of two integer args
  cvtsi2sd xmm1, rdx ; to double
  addsd xmm0, xmm1 ; REAL8 add in xmm regs
  movd rax, xmm0 ; return REAL8 in xmm0 and rax
  ret
sumD endp
DefProc sumF proc !<naked!> argDouble0:REAL4, argDword1, argDouble2:REAL4, argDword3
  addss xmm0, xmm2 ; REAL4 add in xmm regs
  cvtss2sd xmm0, xmm0 ; REAL4 to REAL8
  add rdx, r9 ; sum of two integer args
  cvtsi2sd xmm1, rdx ; to double
  addsd xmm0, xmm1 ; REAL8 add in xmm regs
  movd rax, xmm0 ; return REAL8 in xmm0 and rax
  ret
sumF endp
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
;   int 3
  Print Str$("Result sum D: %f\n", rv(sumD, FP8(1000.11111), 100, FP8(10.0), 1))
  Print Str$("Result sum F: %f\n", rv(sumF, FP4(1000.11111), 100, FP4(10.0), 1))
EndOfCode ; OPT_Assembler  ML


Output:
This program was assembled with ml64 in 64-bit format.
Result sum D: 1111.111110
Result sum F: 1111.111084


One interesting aspect here: you can't feed REAL4 ("single") values to printf(), and you can't feed it directly with xmm regs. The CRT printf() wants a double in rax (or another register or REAL8 memory location). Therefore both sumD and sumF return doubles in xmm0 and rax.

I attach two executables, one with an int 3 right before the call to sumD. Run it from a DOS prompt, it won't wait for a keypress.

hutch--

tenkey is correct,

sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,

The Microsoft x64 ABI specifies the first 4 XMM registers for arguments and the return value if any in XMM0. This is how you would interact with an external high level language that requires ABI calling convention comformity, with a 64 bit MASM app you can use the first 6 XMM registers and still remain ABI compliant in terms of register usage but it is not compatible with either the Win API or external procedure calls.

Also note that at least some CRT functions use the XMM registers and will over write any of the first 4 XMM registers.

This is why in a macro I posted above, you DO NOT MIX integer registers and XMM registers.

  ; -------------------------------------------------------------------
  ; with invoke, the integer args must come first. FP load must be last
  ; -------------------------------------------------------------------
    invoke LoadTest,1111,2222,3333,4444, xloadrv(1234.1234, 5678.5678)

It certainly can be done other ways but this tests up reliably.

jj2007

Quote from: tenkey on April 11, 2021, 08:11:22 AM
Compiling in release mode with optimizations enabled produces:

; Line 16
movsd xmm2, QWORD PTR __real@4024000000000000
lea rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@
movaps xmm3, xmm2
movq r9, xmm2
movq r8, xmm2
mov edx, 1
call printf


Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.

That's how MSVC works.

The first four arguments are in rcx rdx r8 r9, but as you write correctly, they are untouched because sumd and sumf are not VARARG. However, printf() is a VARARG function:
float f = sumf(1.0, 2.0, 3.0, 4.0);
double d = sumd(1.0, 2.0, 3.0, 4.0);
printf("%d: float = %f, double = %f\n", 1, f, d);

rcx is the format string
rdx is the constant 1
r8 is f
r9 is d
xmm2 is a copy of f, d (the compiler seems to know that they are identical)
xmm3 is a copy of f, d


From my tests it seems that xmm2 and xmm3 are ignored by printf().

MSVC may behave differently if you add, in sumf and sumd, a random value to the sum, such as GetTickCount().

nidud

#23
deleted

jj2007

Quote from: nidud on April 12, 2021, 12:03:48 AM
    movsd xmm0,1.0
    movsd xmm1,2.0

    printf("xmm0: %f\nxmm1: %f\n", xmm0, xmm1)

Output:

xmm0: 1.000000
xmm1: 2.000000


Post the executable :biggrin:

nidud

#25
deleted

jj2007

#26
Quote from: nidud on April 12, 2021, 02:15:19 AMAssembly is actually a low-level programming language believe it or not

I believe you :tongue:

        movq    r8, xmm1                                ; 0014 _ 66 49: 0F 7E. C8
        movq    rdx, xmm0                               ; 0019 _ 66 48: 0F 7E. C2
        lea     rcx, [DS0000]                           ; 001E _ 48: 8D. 0D, 00000000(rel)
        call    printf                                  ; 0025 _ E8, 00000000(rel)


Thanks, that's why I asked you to post the exe :cool:

It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.

nidud

#27
deleted

jj2007

Quote from: nidud on April 12, 2021, 08:14:42 AM
Quote
It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.

That is true for Linux.

Ok, makes sense. I wonder how it works with mixed arguments, such printf("%f %f %f %f", xmm0, MyDouble1, xmm1, MyDouble2)

nidud

#29
deleted