News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

unknown argument type -> xmm0

Started by markallyn, October 29, 2017, 04:55:17 AM

Previous topic - Next topic

markallyn

Hello everyone,

Ah yes, still the steep learning curve is still steep.

I have no idea why ML64 is claiming that the XMM0 register is an "unknown argument type" when it hits the following couple of lines:
Quote
{prolog}

sub rsp, 30h
movsd  xmm0, pie ;pie is declared a real8 in 3.14159 in data segment
invoke printf, ADDR format, xmm0

...
{epilog}


I must be doing something stupid because using CPUID it is clear that the SSE technology is there on the board.

Regards,
Mark Allyn

jj2007

Mark,
The problem is probably the CRT. This code works just fine:include \Masm32\MasmBasic\Res\JBasic.inc
.code
pie REAL8 3.141592653589793238
Init
  movsd  xmm0, pie ;pie is declared a real8 in 3.14159 in data segment
  usedeb=1
  deb 4, "PI in xmm0", f:xmm0
  Inkey Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
EndOfCode


Output:
PI in xmm0      f:xmm0  3.141592654
This code was assembled with ml64 in 64-bit format

markallyn

Hi JJ,

Thanks for responding.  I do appreciate it!  After considerable messing around it turned out that the problem was coming from the invoke macro.  Invoke doesn't appear to "know" the sse registers.  If I use a call printf instruction, then printf  "sort of" works. I use the qualifier "sort of" because the xmm0 register must first be placed in a gp register--in this case rdx (rcx contains the format).  In other words, printf doesn't seem to know the sse registers for floats.  don't understand why this is going on.  If you can enlighten me on this point it would be very informative for me.

The other puzzling feature of this concerns the rax register.  In linux systems rax contains the number of sse registers in a variadic function.   Apparently, this isn't the case with masm 64.  Am I wrong?

Regards,
Mark


hutch--

The "invoke" macro is written in accordance with the Microsoft ABI which directly handles from BYTE to QWORD as register/stack arguments. When you use SSE or AVX registers you load the directly and use the CALL mnemonic.

aw27

Quote
I must be doing something stupid because using CPUID it is clear that the SSE technology is there on the board.
All x64 CPUs support at least SSE2.

jj2007

Quote from: markallyn on October 29, 2017, 09:11:25 AMInvoke doesn't appear to "know" the sse registers.  If I use a call printf instruction, then printf  "sort of" works. I use the qualifier "sort of" because the xmm0 register must first be placed in a gp register--in this case rdx (rcx contains the format).  In other words, printf doesn't seem to know the sse registers for floats.

Mark,
As Hutch alluded already, using xmm regs so directly is not really foreseen by the x64 ABI, and it would be surprising if good ol' CRT was more advanced. So a workaround is needed. Nice to see that rdx works, here is another one:
  sub rsp, 2*REAL8 ; simulate a memory variable (attention alignment & shadow space...)
  movlps [rsp], xmm0
  jinvoke crt_printf, Chr$("xmm0", 9, "%f", 13, 10), REAL8 ptr [rsp]
  add rsp, 2*REAL8


This works exactly like...

jinvoke crt_printf, Chr$("pie", 9, "%f"), pie

...where jinvoke knows from the .data declaration that pie is a REAL8.

aw27

Quote
sub rsp, REAL8         ; simulate a memory variable (attention alignment & shadow space...)
It should be aligned before so it becomes not aligned.

jj2007

Right, corrected above. I haven't touched this stuff for a while, apologies :bgrin:

hutch--

I am yet to see why anyone would want to pass registers as arguments when the Microsoft ABI specifically uses 4 64 bit registers, rcx, rdx, r8 and r9 and you just cannot fit anything bigger into a 64 bit register. If you want to pass data in SSE, AVX or AVX2 registers to a procedure, write the data directly to the registers then call the procedure using any of the 64 bit registers OR SMALLER. Those larger registers are NOT CHANGED when an ABI compliant procedure call is made as it only uses the previously mentioned 4 specific registers and if there are more than 4 arguments stack addresses are used.

Eve if you roll your own procedure calls you are doing double data transfer to and from a larger than 64 bit register which makes your procedure call slower for no purpose.

habran

UASM is able to handle xmm, ymm and zmm register with VECTORCALL 8)
I have attached asm file with examples how to use it.
Check it.
Cod-Father

habran

Cod-Father

markallyn

Hello everyone,

My thanks to all of you for helping me with this one.  I haven't yet looked at Habran's material, but will certainly do so.

As a beginner I can only say that I do strange, inexplicable things with code simply as experiments.  Sometimes they work, sometimes not.  This one didn't work in the sense of producing usable code, but for beginner experiments it did yield a great deal of useful information--thanks to all of you--so, in a sense, it succeeded. 

More to follow ...

Regards to all.
Mark

felipe

markallyn: It's ok to experiment in a controlled way i think. But if you don't want to waste time, it will be bettter for you to learn basic things (whatever the subject can be), then the errors in these controlled environment will be more illustrating for you.

Just an advice, btw.  :icon14:

markallyn

Hello again, everyone,

OK, here is my last post on this one, though others may wish of course to respond.

The following code works.  Printf is called with rdx containing the results of a silly little SIMD addition of two real8 floats per vector.

Quote
include \masm32\include64\masm64rt.inc
printf   PROTO :QWORD, :VARARG

.data
frmt   BYTE   "%lf",13,10,0
dblfrmt   BYTE   "%lf",13,10,0
ALIGN 16
v1   REAL8    1.5,1.6
ALIGN 16
v2   REAL8   2.5,2.6


.data?
ALIGN 16
buff   real8 2 dup(?)


.code
main PROC
align 16
movapd   xmm0, v1 ;[eax]
movapd   xmm1, v2 ;[ebx]
addpd    xmm0, xmm1
movapd   [buff], xmm0
mov   rdx, real8 ptr[buff]
invoke   printf, ADDR frmt, rdx
mov   rdx, real8 ptr[buff+8]
invoke   printf, ADDR frmt, rdx
ret
main ENDP
END

One important finding of this experiment was that the numbers must be real8.  Printf doesn't work if one attempts to print real4's as decimal numbers.  It will print the correct ieee754 hex representation if the format string is %x, but not if it is %f.

I do have one question.  Must I use an ALIGN 16 directive for each of the float8 statements or would a single ALIGN 16 at the beginning of the .DATA segment be sufficient to align everything to 16 bytes?

Lastly, to Felipe.  I couldn't agree more with your advice to learn the basics first.  I'd love to do that.  But there arises the question:  What are "the basics"?  I work on my own.  I have no colleagues, although I do have a very patient dog.  And I have the forum--you guys.  And a couple of books.  That's it.  So, I blunder along doing the best I can.

Thanks, folks.

Mark

hutch--

Mark,

If you set an "align 16" at a location in the .data section, the following variable will only remain aligned if they are all 16 bytes long. The alignment is required if you are going to use instructions that require alignment, some do and some don't. As space is not a big deal in 64 bit code, you do better if all of your data is aligned properly.