Author Topic: Ray Chen on Win64  (Read 3814 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 8236
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Ray Chen on Win64
« on: June 28, 2016, 07:02:41 PM »

The history of calling conventions, part 5: amd64
---------------
Raymond Chen - MSFTJanuary 14, 200432
0
0
0
The last architecture I'm going to cover in this series is the AMD64 architecture (also known as x86-64).

The AMD64 takes the traditional x86 and expands the registers to 64 bits, naming them rax, rbx, etc. It
also adds eight more general purpose registers, named simply R8 through R15.

The first four parameters to a function are passed in rcx, rdx, r8 and r9. Any further parameters are
pushed on the stack. Furthermore, space for the register parameters is reserved on the stack, in case
the called function wants to spill them; this is important if the function is variadic.

Parameters that are smaller than 64 bits are not zero-extended; the upper bits are garbage, so remember
to zero them explicitly if you need to. Parameters that are larger than 64 bits are passed by address.

The return value is placed in rax. If the return value is larger than 64 bits, then a secret first
parameter is passed which contains the address where the return value should be stored.

All registers must be preserved across the call, except for rax, rcx, rdx, r8, r9, r10, and r11, which
are scratch.

The callee does not clean the stack. It is the caller's job to clean the stack.

The stack must be kept 16-byte aligned. Since the "call" instruction pushes an 8-byte return address,
this means that every non-leaf function is going to adjust the stack by a value of the form 16n+8 in
order to restore 16-byte alignment.

Here's a sample:

void SomeFunction(int a, int b, int c, int d, int e);
void CallThatFunction()
{
    SomeFunction(1, 2, 3, 4, 5);
    SomeFunction(6, 7, 8, 9, 10);
}
On entry to CallThatFunction, the stack looks like this:

xxxxxxx0   .. rest of stack ..   
xxxxxxx8   return address   <- RSP
Due to the presence of the return address, the stack is misaligned. CallThatFunction sets up its stack
frame, which might go like this:

    sub    rsp, 0x28
Notice that the local stack frame size is 16n+8, so that the result is a realigned stack.

xxxxxxx0   .. rest of stack ..   
xxxxxxx8   return address   
xxxxxxx0       (arg5)
xxxxxxx8       (arg4 spill)
xxxxxxx0       (arg3 spill)
xxxxxxx8       (arg2 spill)
xxxxxxx0       (arg1 spill) <- RSP
Now we can set up for the first call:

        mov     dword ptr [rsp+0x20], 5     ; output parameter 5
        mov     r9d, 4                      ; output parameter 4
        mov     r8d, 3                      ; output parameter 3
        mov     edx, 2                      ; output parameter 2
        mov     ecx, 1                      ; output parameter 1
        call    SomeFunction                ; Go Speed Racer!
When SomeFunction returns, the stack is not cleaned, so it still looks like it did above. To issue the
second call, then, we just shove the new values into the space we already reserved:

        mov     dword ptr [rsp+0x20], 10    ; output parameter 5
        mov     r9d, 9                      ; output parameter 4
        mov     r8d, 8                      ; output parameter 3
        mov     edx, 7                      ; output parameter 2
        mov     ecx, 6                      ; output parameter 1
        call    SomeFunction                ; Go Speed Racer!
CallThatFunction is now finished and can clean its stack and return.

        add     rsp, 0x28
        ret
Notice that you see very few "push" instructions in amd64 code, since the paradigm is for the caller to
reserve parameter space and keep re-using it.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

mineiro

  • Member
  • ****
  • Posts: 684
Re: Ray Chen on Win64
« Reply #1 on: June 29, 2016, 09:46:45 AM »
I don't know the meaning of 'non-leaf' word before read this text 3 times. From my searching means that is a function that call other function. Is this?

So, how to proceed when creating a recursive call? A simple fibbonacci or a factorial example, or a harder sudoku solver. Should have pass 1 (16N+8) and pass 2(16N)? Is this the point?

Well, much more I read less I understand.  :dazzled:
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

rrr314159

  • Member
  • *****
  • Posts: 1382
Re: Ray Chen on Win64
« Reply #2 on: June 29, 2016, 10:37:07 AM »
mineiro, non-leaf does indeed mean it calls another function. For recursive routines each call must allocate shadow space, you can't reserve it all at the beginning, although you could test the stack to make sure there's enough available - if you know how deep the recursion will go. Coincidentally the first code I posted here, which was 64-bit, uses recursion: http://masm32.com/board/index.php?topic=3926.0. Very primitive, I didn't even know what a macro was back then, but you might find it amusing.
I am NaN ;)

mineiro

  • Member
  • ****
  • Posts: 684
Re: Ray Chen on Win64
« Reply #3 on: June 29, 2016, 09:19:27 PM »
Thanks for answering sir rrr314159, I will study your code. I have read that topic, now is time to think a lot.
On Chen code, he used intergers (sdwords) and say that upper part of qword is ignored. I think that if recursion is lower than some gigs and working with signed 32 bits number, we can store a unsigned 32 bit return address on that 'wasted' stack. Now thinking a lot, I get a nice idea about this, if works I will say after some tests.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

  • Member
  • ****
  • Posts: 684
Re: Ray Chen on Win64
« Reply #4 on: July 01, 2016, 10:45:37 AM »
x64 PRIMER: Everything You Need To Know To Start Programming 64-Bit Windows Systems - Matt Pietrek
May 2006
https://msdn.microsoft.com/magazine/msdn-magazine-issues

Quote
The subject of pointers and DWORDs segues nicely into the Win64 type system. How big is a pointer? How about a LONG? And what about handles, like HWNDs? Mercifully, when Microsoft made the messy transition from Win16 to Win32, they made the new type models easy to extend further to 64 bits. Generally speaking, and with a few exceptions, all types other than pointers and size_t are exactly the same in the new 64-bit world as in Win32. That is, a 64-bit pointer is 8 bytes, while int, long, DWORD, and HANDLE are still 4 bytes. I'll talk more about types later when I discuss developing for Win64. [ Editor's Update - 5/2/2006: Handles are defined as pointer values. Thus in Win64, a handle is 8 bytes, not 4 bytes.]
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

mineiro

  • Member
  • ****
  • Posts: 684
Re: Ray Chen on Win64
« Reply #5 on: July 07, 2016, 03:09:36 AM »
Have found this document on my hd.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

habran

  • Member
  • *****
  • Posts: 1226
    • uasm
Re: Ray Chen on Win64
« Reply #6 on: July 07, 2016, 05:51:58 AM »
That is an excellent document sir mineiro :t
There is only missing a VECTORCALL convention.
 
Cod-Father