Back to "REAL8 in xmm reg?"
If you are interfacing to MSVC C/C++ APIs (gcc has a different one), the answer is yes if it's one of the first four arguments of the C/C++ function definition or call. Win64 functions are MSVC types of functions.
I compiled a simple C program in VS2019 (Community) to find out what regs are used.
#include <stdio.h>
float sumf(float a, float b, float c, float d)
{
return a + b + c + d;
}
double sumd(double a, double b, double c, double d)
{
return a + b + c + d;
}
int main(int argc, char** argv)
{
float f = sumf(1.0, 2.0, 3.0, 4.0);
double d = sumd(1.0, 2.0, 3.0, 4.0);
printf("%d: float = %f, double = %f\n", 1, f, d);
}
Compiling in release mode with optimizations enabled produces:
; Function compile flags: /Ogtpy
; COMDAT sumf
_TEXT SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumf PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 4
addss xmm0, xmm1
addss xmm0, xmm2
addss xmm0, xmm3
; Line 5
ret 0
sumf ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; COMDAT sumd
_TEXT SEGMENT
a$ = 8
b$ = 16
c$ = 24
d$ = 32
sumd PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 9
addsd xmm0, xmm1
addsd xmm0, xmm2
addsd xmm0, xmm3
; Line 10
ret 0
sumd ENDP
_TEXT ENDS
; Function compile flags: /Ogtpy
; COMDAT main
_TEXT SEGMENT
argc$ = 48
argv$ = 56
main PROC ; COMDAT
; File D:\Henry\Win64asm\C++\fpargs\fpargs\fpargs.cpp
; Line 13
$LN8:
sub rsp, 40 ; 00000028H
; Line 16
movsd xmm2, QWORD PTR __real@4024000000000000
lea rcx, OFFSET FLAT:??_C@_0BN@KKHEANPC@?$CFd?3?5float?5?$DN?5?$CFf?0?5double?5?$DN?5?$CFf?6@
movaps xmm3, xmm2
movq r9, xmm2
movq r8, xmm2
mov edx, 1
call printf
; Line 17
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
main ENDP
_TEXT ENDS
sumf expects the values to be in xmm0, xmm1, xmm2, and xmm3.
It leaves the result in xmm0.
rcx, rdx, r8, r9, and the shadow space are untouched.
It uses REAL4 instructions,
sumd is the same, except that it uses REAL8 instructions.
Because it is in the same file and the functions are called with constants, the compiler has optimized away the calls to sumf and sumd. The main function is reduced to:
printf("%d: float = %f, double = %f\n", 1, 10.0, 10.0);
The REAL8 10.0 is stored in CONST memory, and because printf has varargs, the 10.0 is loaded into xmm2, xmm3, r8, and r9.
The unoptimized debug version retains the function calls:
; Line 14
movss xmm3, DWORD PTR __real@40800000
movss xmm2, DWORD PTR __real@40400000
movss xmm1, DWORD PTR __real@40000000
movss xmm0, DWORD PTR __real@3f800000
call sumf
movss DWORD PTR f$[rbp], xmm0
; Line 15
movsd xmm3, QWORD PTR __real@4010000000000000
movsd xmm2, QWORD PTR __real@4008000000000000
movsd xmm1, QWORD PTR __real@4000000000000000
movsd xmm0, QWORD PTR __real@3ff0000000000000
call sumd
movsd QWORD PTR d$[rbp], xmm0
Because sumf and sumd do not have varargs, registers rcx, rdx, r8, and r9 are untouched.
That's how MSVC works.