News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

fastcall 64 bits rules

Started by TouEnMasm, August 16, 2015, 06:02:24 PM

Previous topic - Next topic

TouEnMasm

I try to made a brief but enough complete note on the subject
This give that at the instant:
Quote
comment µ
   FASTCALL convention
*************** the start adress must be aligned on 16 when calling API *************
The stack memory space isn't reserved as in the 32 bits
The stack is align to 8, when reaching the start-label, but it must be aligned to 16 when calling APIs.
There is need to allocate at least the shadow space for RCX, RDX, r8 and r9:
---------------------------------------------------------------
----------- add rsp,-(8 + 4 * 8 ) ; - (align + shadow space)
---------------------------------------------------------------

Preserve or not ?
;------------- memo --------------------------
;RBX, RBP, RDI, RSI, R12, R13, R14, and R15 preserve
;RAX, RCX, RDX, R8, R9, R10, and R11    ;general registers
;---------------------------------------------------

Call don't use the stack to pass arguments but registers in this order
;--------------------------------------------------------------------
;RCX: 1st integer argument RDX: 2nd integer argument R8: 3rd integer argument R9: 4th integer argument
;More arguments use stack
;--------------------------------------------------------------------

µ

contributions are welcome
I have added a sample



Fa is a musical note to play with CL

rrr314159

#1
BTW this belongs in 64-bit forum ...

Most important fact missing is that when more than 4 arguments they're passed on the stack, below the shadow space. Other details ... usually adding (or, subtracting) 8 to rsp puts it on a 16 boundary because it's already aligned to 8 (the return address after a call)  but not always, one should check (as in my invoke_macros in 64-bit forum). Also, quite a few API's work unaligned, (aligned to 4, or even to an odd number!). You can't count on that in future OS's. But the thing to watch out for, if you test your calling technique with one of those API's it will work even tho alignment is not correct, so later it will fail on others. (This gotcha has happened to me and others; there are old threads on the subject where they never figured out what was going on). Another wrinkle is floating point. Most people think they are "always" passed in XMM reg's but no. The main exception is VARARG calls such as print functions; reals are passed in GPR's, and XMM's are ignored. There are other exceptions.

- A few more details, like how to pass args that aren't 8 bytes ... look at my invoke macro post I think it's all covered there - at least, in the code if not the words
I am NaN ;)

MichaelW

Quote from: rrr314159 on August 17, 2015, 02:47:14 AM
Most important fact missing is that when more than 4 arguments they're passed on the stack, below the shadow space.

What exactly do you mean by "below"?
Well Microsoft, here's another nice mess you've gotten us into.

rrr314159

Actually I was incorrectly thinking backwards; they're not below, they're above. I forgot args are put on the stack in reverse order, so later args are at higher addresses. Thanks for catching that.
I am NaN ;)

TouEnMasm

Quote
But the thing to watch out for, if you test your calling technique with one of those API's it will work even tho alignment is not correct, so later it will fail on others. (This gotcha has happened to me and others; there are old threads on the subject where they never figured out what was going on)

Did you have a little sample on that ?
Fa is a musical note to play with CL

rrr314159

; Sample of API that works unaligned (printf) and API that doesn't (MessageBoxA)
; rrr314159 8/17/2015
; \bin\JWasm -win64 sample.asm
; \bin\link /subsystem:console sample.obj

option casemap:none
includelib \lib\kernel32.lib
includelib \lib\user32.lib
includelib \lib\msvcrt.lib
printf proto :ptr SBYTE, :VARARG
MessageBoxA proto :QWORD, :ptr BYTE, :ptr BYTE, :DWORD
ExitProcess proto :DWORD
.data
    teststring db "Printf Works With Unaligned RSP: ", 0
    formatstring db "%s%x", 10, 0
    MBstring1 db "MessageBox Must Have Aligned RSP", 0
    MBstring2 db "Sample", 0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
    and rsp, -10h

    sub rsp, 1
        invoke printf, addr formatstring, addr teststring, rsp
    add rsp, 1

    ; uncomment below rsp add/sub lines and it will fail

    ;sub rsp, 1
        invoke MessageBoxA,0,addr MBstring1,addr MBstring2,1
    ;add rsp, 1

call ExitProcess
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
I am NaN ;)

TouEnMasm


printf is a C prototype and Something is wrong on the call.
I try to use it as C prototype but it is not so easy
 
Fa is a musical note to play with CL

rrr314159

Does that mean my sample doesn't work on your machine? There are many other calls that will work unaligned, but since I more-or-less always use my nvk macro (align whether it's needed or not) I've forgotten which ones. (I even forgot, momentarily, that args are put on stack reverse order! nvk is much more trouble-free than JWasm invoke). I can look it up in my notes, if I can find them ... I think, for instance, GetModuleHandle works unaligned, but not GetFileSize ... My guess was, it depends whether the function saves XMM's on the stack. Let me know if you need more info, but the only safe course is, always align.
I am NaN ;)

TouEnMasm


That mean that link don't find the function (bad proto)
And if I change to PROTO C, ----> bad argument.Don't want addr ... (jwasm,ml64)

Fa is a musical note to play with CL

rrr314159

It's odd, when proto has a pointer, like :ptr SBYTE, you can change that to anything at all, like :ptr ANYTHING_AT_ALL, and it still works ! On my system, it only notices "ptr" and assumes a qWord. That's true with JWasm, I forget if ML64 is like that also. Maybe yours is different, and SBYTE should be something else, like BYTE.

Anyway why don't you substitute your own .inc files for the three proto's, like

include \mypath\kernel32.inc
include \mypath\user32.inc

etc?

I was using the proto's instead of .inc's to "keep it simple" (since we may have different sets of .incs) but perhaps that was not a good idea

I've run into similar incompatibilities before, don't know why; on all my 64-bit machines this code works
I am NaN ;)

TouEnMasm

Sorry to be a little slow but with Windows 10 many crt functions are _INLINE
and there is need to build a Library to use them.

After made that,I find the same results as you,OK
Thanks for the sample

NOTE:
fprintf is a c prototype in 32 bits ;2012 _fprintf  view dumpbin
fprintf is a std proto in 64 bits ;1FCE printf    view dumpbin
Quote
#ifdef _M_CEE_PURE
    #define __CLRCALL_PURE_OR_CDECL __clrcall
#else
    #define __CLRCALL_PURE_OR_CDECL __cdecl
#endif
#define __CRTDECL __CLRCALL_PURE_OR_CDECL

_Check_return_opt_
_CRT_STDIO_INLINE int __CRTDECL printf(
    _In_z_ _Printf_format_string_ char const* const _Format,
    ...)

Fa is a musical note to play with CL

rrr314159

After made that, I find the same results as you, OK

- glad to hear that

Thanks for info re. C vs STD proto, something to watch out for
I am NaN ;)