Re: X64 ABI, REAL8 passed in xmmreg?

jj2007 · April 11, 2021, 02:14:07 AM

Quote from: HSE on April 11, 2021, 01:50:15 AM
Quote from: jj2007 on April 11, 2021, 01:32:55 AM
Coming soon - work in progress

Remember we need prototypes for share libraries. It's almost irrelevant for our own internal functions.

I know, and that's a problem. Not for my macros, actually, they do handle prototypes correctly. I guess I have to extract them from some C compiler's headers. Have you ever checked \Masm32\include\*.inc for the correctness of the PROTOs?

hutch-- · April 11, 2021, 02:32:51 AM

Hector,

Same old problem, you are confusing compilers and assemblers, most compilers are designed to use prototypes but not all. The Microsoft x64 ABI is based off C++ which is a compiler but again, MASM does not support prototypes as it is NOT a compiler. Now with the sense of prototypes that you have in mind, to be x64 ABI compliant, you would be using C++, not MASM and while that is a viable route, MASM simply does not support prototyping so it is a dead issue at best.

Remove C++ from your equasion and you can routinely write ABI compliant assembler code in MASM.

HSE · April 11, 2021, 02:36:29 AM

Quote from: jj2007 on April 11, 2021, 02:14:07 AM
I guess I have to extract them from some C compiler's headers. Have you ever checked \Masm32\include\*.inc for the correctness of the PROTOs?

I think libraries don't care what assembler you use. If you have doubts, you can try others include sets like Nidud's, Biterider's, Yves', no need to make a new one.

hutch-- · April 11, 2021, 03:32:20 AM

I am not writing shared libraries, I only write MASM libraries and while others may be able to use them, they will have to cobble together their own prototypes because in case you have missed it, MASM does not use prototypes. The Watcom derivatives may want to end up as compilers but MASM does not need to do that, that is what 64 bit CL.EXE is for.

> If you have doubts, you can try others include sets like Nidud's, Biterider's, Yves', no need to make a new one.

Why bother wasting time when MASM does not need or use Watcom derivative's prototypes. I already have a near exhaustive set for the linker, why would I settle for the booby prize.

tenkey · April 11, 2021, 08:11:22 AM

Back to "REAL8 in xmm reg?"
If you are interfacing to MSVC C/C++ APIs (gcc has a different one), the answer is yes if it's one of the first four arguments of the C/C++ function definition or call. Win64 functions are MSVC types of functions.
I compiled a simple C program in VS2019 (Community) to find out what regs are used.

Code Select

#include <stdio.h>
float sumf(float a, float b, float c, float d)
{
	return a + b + c + d;
}

double sumd(double a, double b, double c, double d)
{
	return a + b + c + d;
}

int main(int argc, char** argv)
{
	float f = sumf(1.0, 2.0, 3.0, 4.0);
	double d = sumd(1.0, 2.0, 3.0, 4.0);
	printf("%d: float = %f, double = %f\n", 1, f, d);
}

Compiling in release mode with optimizations enabled produces:

Code Select

; Function compile flags: /Ogtpy
;	COMDAT sumf
_TEXT	SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumf	PROC						; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 4
	addss	xmm0, xmm1
	addss	xmm0, xmm2
	addss	xmm0, xmm3
; Line 5
	ret	0
sumf	ENDP
_TEXT	ENDS
; Function compile flags: /Ogtpy
;	COMDAT sumd
_TEXT	SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumd	PROC						; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 9
	addsd	xmm0, xmm1
	addsd	xmm0, xmm2
	addsd	xmm0, xmm3
; Line 10
	ret	0
sumd	ENDP
_TEXT	ENDS
; Function compile flags: /Ogtpy
;	COMDAT main
_TEXT	SEGMENT
argc$ = 48
argv$ = 56
main	PROC						; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 13
$LN8:
	sub	rsp, 40					; 00000028H
; Line 16
	movsd	xmm2, QWORD PTR __real@4024000000000000
	lea	rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@
	movaps	xmm3, xmm2
	movq	r9, xmm2
	movq	r8, xmm2
	mov	edx, 1
	call	printf
; Line 17
	xor	eax, eax
	add	rsp, 40					; 00000028H
	ret	0
main	ENDP
_TEXT	ENDS

sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,

sumd is the same, except that it uses REAL8 instructions.

Because it is in the same file and the functions are called with constants, the compiler has optimized away the calls to sumf and sumd. The main function is reduced to:

Code Select

    printf("%d: float = %f, double = %f\n", 1, 10.0, 10.0);

The REAL8 10.0 is stored in CONST memory, and because printf has varargs, the 10.0 is loaded into xmm2, xmm3, r8, and r9.

The unoptimized debug version retains the function calls:

Code Select

; Line 14
	movss	xmm3, DWORD PTR __real@40800000
	movss	xmm2, DWORD PTR __real@40400000
	movss	xmm1, DWORD PTR __real@40000000
	movss	xmm0, DWORD PTR __real@3f800000
	call	sumf
	movss	DWORD PTR f$[rbp], xmm0
; Line 15
	movsd	xmm3, QWORD PTR __real@4010000000000000
	movsd	xmm2, QWORD PTR __real@4008000000000000
	movsd	xmm1, QWORD PTR __real@4000000000000000
	movsd	xmm0, QWORD PTR __real@3ff0000000000000
	call	sumd
	movsd	QWORD PTR d$[rbp], xmm0

Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.

That's how MSVC works.

jj2007 · April 11, 2021, 09:54:40 AM

Nice example, tenkey

Here is my translation of your example (naked means no shuffling of passed regs into shadow space); to make it more complicated, I used mixed arguments:

Code Select

include \Masm32\MasmBasic\Res\JBasic.inc	; ## builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
DefProc sumD proc !<naked!> argDouble0:REAL8, argDword1, argDouble2:REAL8, argDword3
  addsd xmm0, xmm2	; REAL8 add in xmm regs
  movd rax, xmm0
  add rdx, r9		; sum of two integer args
  cvtsi2sd xmm1, rdx	; to double
  addsd xmm0, xmm1	; REAL8 add in xmm regs
  movd rax, xmm0	; return REAL8 in xmm0 and rax
  ret
sumD endp
DefProc sumF proc !<naked!> argDouble0:REAL4, argDword1, argDouble2:REAL4, argDword3
  addss xmm0, xmm2	; REAL4 add in xmm regs
  cvtss2sd xmm0, xmm0	; REAL4 to REAL8
  add rdx, r9		; sum of two integer args
  cvtsi2sd xmm1, rdx	; to double
  addsd xmm0, xmm1	; REAL8 add in xmm regs
  movd rax, xmm0	; return REAL8 in xmm0 and rax
  ret
sumF endp
Init		; OPT_64 1	; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
;   int 3
  Print Str$("Result sum D: %f\n", rv(sumD, FP8(1000.11111), 100, FP8(10.0), 1))
  Print Str$("Result sum F: %f\n", rv(sumF, FP4(1000.11111), 100, FP4(10.0), 1))
EndOfCode	; OPT_Assembler  ML

Output:

Code Select

This program was assembled with ml64 in 64-bit format.
Result sum D: 1111.111110
Result sum F: 1111.111084

One interesting aspect here: you can't feed REAL4 ("single") values to printf(), and you can't feed it directly with xmm regs. The CRT printf() wants a double in rax (or another register or REAL8 memory location). Therefore both sumD and sumF return doubles in xmm0 and rax.

I attach two executables, one with an int 3 right before the call to sumD. Run it from a DOS prompt, it won't wait for a keypress.

hutch-- · April 11, 2021, 10:25:29 AM

tenkey is correct,

sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,

The Microsoft x64 ABI specifies the first 4 XMM registers for arguments and the return value if any in XMM0. This is how you would interact with an external high level language that requires ABI calling convention comformity, with a 64 bit MASM app you can use the first 6 XMM registers and still remain ABI compliant in terms of register usage but it is not compatible with either the Win API or external procedure calls.

Also note that at least some CRT functions use the XMM registers and will over write any of the first 4 XMM registers.

This is why in a macro I posted above, you DO NOT MIX integer registers and XMM registers.

; -------------------------------------------------------------------
; with invoke, the integer args must come first. FP load must be last
; -------------------------------------------------------------------
invoke LoadTest,1111,2222,3333,4444, xloadrv(1234.1234, 5678.5678)

It certainly can be done other ways but this tests up reliably.

jj2007 · April 11, 2021, 10:41:16 AM

Quote from: tenkey on April 11, 2021, 08:11:22 AM
Compiling in release mode with optimizations enabled produces:

Code Select Expand
; Line 16 movsd xmm2, QWORD PTR __real@4024000000000000 lea rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@ movaps xmm3, xmm2 movq r9, xmm2 movq r8, xmm2 mov edx, 1 call printf

Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.

That's how MSVC works.

The first four arguments are in rcx rdx r8 r9, but as you write correctly, they are untouched because sumd and sumf are not VARARG. However, printf() is a VARARG function:

Code Select

	float f = sumf(1.0, 2.0, 3.0, 4.0);
	double d = sumd(1.0, 2.0, 3.0, 4.0);
	printf("%d: float = %f, double = %f\n", 1, f, d);

Code Select

rcx is the format string
rdx is the constant 1
r8 is f
r9 is d
xmm2 is a copy of f, d (the compiler seems to know that they are identical)
xmm3 is a copy of f, d

From my tests it seems that xmm2 and xmm3 are ignored by printf().

MSVC may behave differently if you add, in sumf and sumd, a random value to the sum, such as GetTickCount().

nidud · April 12, 2021, 12:03:48 AM

deleted

jj2007 · April 12, 2021, 12:09:15 AM

Quote from: nidud on April 12, 2021, 12:03:48 AM
Code Select Expand
movsd xmm0,1.0 movsd xmm1,2.0 printf("xmm0: %f\nxmm1: %f\n", xmm0, xmm1)
Output:

xmm0: 1.000000
xmm1: 2.000000

Post the executable

nidud · April 12, 2021, 02:15:19 AM

deleted

jj2007 · April 12, 2021, 06:34:13 AM

Quote from: nidud on April 12, 2021, 02:15:19 AMAssembly is actually a low-level programming language believe it or not

I believe you

Code Select

        movq    r8, xmm1                                ; 0014 _ 66 49: 0F 7E. C8
        movq    rdx, xmm0                               ; 0019 _ 66 48: 0F 7E. C2
        lea     rcx, [DS0000]                           ; 001E _ 48: 8D. 0D, 00000000(rel)
        call    printf                                  ; 0025 _ E8, 00000000(rel)

Thanks, that's why I asked you to post the exe

It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.

nidud · April 12, 2021, 08:14:42 AM

deleted

jj2007 · April 12, 2021, 08:17:40 AM

Quote from: nidud on April 12, 2021, 08:14:42 AM
Quote
It confirms my tests: printf uses internally the reg64, not the xmmregs; on the web, some people pretend that you must pass in eax resp. al the number of xmmregs used, but that doesn't look plausible.

That is true for Linux.

Ok, makes sense. I wonder how it works with mixed arguments, such printf("%f %f %f %f", xmm0, MyDouble1, xmm1, MyDouble2)

nidud · April 12, 2021, 08:04:36 PM

deleted

The MASM Forum

News:

Re: X64 ABI, REAL8 passed in xmmreg?

jj2007

hutch--

HSE

hutch--

tenkey

jj2007

hutch--

jj2007

nidud

jj2007

nidud

jj2007

nidud

jj2007

nidud