News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

push/pop in 64 bit

Started by JK, July 01, 2020, 06:25:13 AM

Previous topic - Next topic

JK

I realize that i cannot use push and pop for 8 byte registers (rax, ... , r15) in 64 bit mode as i could in 32 bit mode. E.g saving a register value temporarily on the stack and retrieving it later on, which works flawlessly in 32 bit, doesn´t work always in 64 bit. The same code (replacing every 4 byte register with an 8 byte register, EAX -> RAX, etc.) doesn´t work or even crashes.

In 64 bit mode the stack must be correctly aligned (16 bit, is this correct?) - ok, but how could i misalign the stack by pushing/popping 8 byte (64 bit) registers, 64 bit being a multiple of 16 bit? Please note, i don´t build my own stackframes, i let the assembler do this work, i just want to save some registers inside a routine. If i use a local or a global for temporarily saving register values, it works. But why can´t i simply use the stack for this in 64 bit like i can in 32 bit? What are the rules?

Looking at a disassembly i see that there is no rbp based stack frame in 64 bit (assembler UASM) did i miss something here?

Any help and explanation appreciated - thanks


JK

jj2007

You can use pairs of push & pop, thus maintaining the 16-byte alignment. However, check carefully the concept of shadow space, e.g. googling for shadow space x64

Mikl__

#2
Hi, JK!
when you call an API function, the value in RSP should be a multiple of 10h, PUSH reg/mem/imm decreases RSP by 8, POP reg/mem increases the contents of RSP by 8. If you use PUSH/POP before calling the API function, then you can use as many PUSH and POP as you want. But if the API function is between PUSH and POP, then you have to use even number of instructions PUSH/POP or use local variables    mov old_eax,eax
      . . . .
    mov eax,old_eax

hutch--

Hi JK,

The stack system in Win64 is not designed to be used like the win32 stack and it is a different calling convention. The 16 byte alignment is necessary and arguments are passed first with 4 registers and subsequently at specific locations on the stack.

I have attached a zipped chm help file that has this type of data and it is technically correct.

JK

Maybe i should post an example. This is UASM code for 32 and 64 bit:

;*************************************************************************************
;assemble console 32           ;assemble 32 bit
assemble console 64            ;assemble 64 bit
;*************************************************************************************


ifdef _WIN64
  option win64:15
 
else
  .386
  .model flat, stdcall
endif


include <windows.inc>

includelib kernel32.lib
includelib user32.lib


.code


start proc
;*************************************************************************************
;
;*************************************************************************************


ifdef _WIN64
  push rax                                            ;why must it be two pushes ?
;  push rax
else
  push eax
endif 


  invoke MessageBoxA, 0, CSTR("works"), CSTR("test"), 0


ifdef _WIN64
  pop rax                                             ;and two pops in order to work in 64 bit
;  pop rax
else
  pop eax
endif 


  invoke ExitProcess, 0
  ret

   
start endp


end start


If i have two pushes and pops it works in 64 bit, if i have only one push and one pop it crashes in 64 bit. Pushing a 64 bit register (64 is a multiple of 16) keeps the stack aligned to 16 bit (10h), so it shouldn´t make problems, but as the code demonstrates - it does! Pushing two 64 bit registers (thus pushing 128 bit) works in 64 bit - why is that?


JK

hutch--

The answer is don't use win32 techniques in win64. You write to stack addresses without messing up the alignment of the stack. When you use the CALL RET pair you are changing the stack by 8 bytes both ways so you must ensure that your start address is aligned correctly. I have had a quick play with UASM and got it to work but you would need one of the UASM guys to decypher how to set it up so it works OK.

I work with MASM and have multiple stack techniques available to deal with a number of different stack requirements.

jj2007

Quote from: JK on July 02, 2020, 06:43:51 AMPushing a 64 bit register (64 is a multiple of 16) keeps the stack aligned to 16 bit (10h)

Nope. Eax is 4 bytes, rax is 8 bytes, not 16 as you seem to believe. So, as explained above, you can use pairs of push & pop, thus maintaining the 16-byte alignment.

JK

A-ha, i see. Thanks for your explantions!

So the rule in 64 bit is: RSP must be 16 byte aligned before calling a procedure
In 64 bit i can do all the things with the stack i can do in 32 bit as long as this rule is met - is this correct?


JK

nidud

#8
deleted

jj2007

Quote from: JK on July 03, 2020, 04:14:21 AMIn 64 bit i can do all the things with the stack i can do in 32 bit as long as this rule is met - is this correct?

Many but not all the things :cool:

Quote from: jj2007 on July 01, 2020, 07:47:43 AMcheck carefully the concept of shadow space, e.g. googling for shadow space x64

hutch--

In win64 FASTCALL, each argument fits into a 64 bit location, the first 4 are what is called "shadow space" as the first 4 arguments are written to 4 registers, rcx rdx r8 and r9. Any additional arguments get written to the higher addresses on the stack so you have a format something like this.

reg | reg| reg | reg | mem | mem | mem | etc ......

If you pass arguments of different sizes from BYTE to QWORD, they are all written to 64 bit addresses and the important thing here is the stack remains at the same alignment. By writing arguments according to the 64 bit Windows calling convention, you don't have to balance the stack on exit with a "ret number", you just use a "ret".

I know for certain that UASM uses the convention correctly and I have no doubt that nidud does so as well. With MASM I had to write the macros that set up the stack for procedure calls and the "invoke" style technique for calling procedures.

It can be done by writing the procedure and the calling technique manually but it is messy and very unreliable where having this automated makes win64 as easy to use as win32. With all of the extra registers and FASTCALL, win64 is more efficient and has less overhead than the 32 bit STDCALL where you push args and balance the stack on exit.

jj2007

Three years ago CMalcheski wrote a nice article titled Nightmare on (Overwh)Elm Street: The 64-bit Calling Convention. It's fun to read :tongue:

Biterider

#12
Hi JJ

Really fun article. Fortunately, I did NOT find it before ...
Otherwise I would not have followed the 64bit path so quickly. :biggrin:


Biterider

jj2007

Hi Biterider,

I've invested my fair share in the jinvoke macro (356 lines), plus prolog+epilog (200), it works fine with Masm and the Watcom clones alike, it even counts and checks parameters so that Hutch doesn't have to hold my hot little hand, but... I just don't see any compelling reason to abandon 32-bit coding  :bgrin:

hutch--

> I just don't see any compelling reason to abandon 32-bit coding  :bgrin:

Except speed, power, memory, twice as many registers etc etc .... You can hide in a small world of 32 bit but it will never get bigger where 64 bit is the future.  :biggrin: