News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

RBP vs RSP stack frames

Started by johnsa, March 24, 2017, 11:27:55 PM

Previous topic - Next topic

aw27

Quote from: johnsa on March 25, 2017, 02:51:57 AM
the invokes will assume they can use [RSP+0] -> [RSP+x] to fill in the parameters..
So if you SUB rsp,Y somewhere in the proc .. invokes would overwrite your dynamic stack allocation ?
No, you are simply "rebasing" the stack pointer. If the function is not a leaf, before calling a subrotine you will have to:
1) subtract the usual 32 bytes
2) align the stack.
On return:
add the usual 32 bytes + bytes used for stack alignment if any.
After that you will be as you were before the call :)

hutch--

This is what I get with 64 bit MASM using a custom prologue/epilogue. The entry/exit code is small and for high level code, its easily fast enough. For low level code you don't use a stack frame.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL a1    :QWORD
    LOCAL a2    :QWORD
    LOCAL a3    :QWORD
    LOCAL a4    :QWORD

    mov a1, 1
    mov a2, 2
    mov a3, 3
    mov a4, 4

    xor rcx, rcx
    call ExitProcess

    ret

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

comment * +++++++++++++++++++++++++++

segment .text
enter 0x80, 0x0
sub rsp, 0x80
mov qword ptr [rbp-0x68], 0x1
mov qword ptr [rbp-0x70], 2
mov qword ptr [rbp-0x78], 3
mov qword ptr [rbp-0x80], 4
xor rcx, rcx
call qword ptr [ExitProcess]
leave
ret
* +++++++++++++++++++++++++++++++++++


The disassembly in detail.

.text:0000000140001000 C8800000                   enter 0x80, 0x0
.text:0000000140001004 4881EC80000000             sub rsp, 0x80
.text:000000014000100b 48C7459801000000           mov qword ptr [rbp-0x68], 0x1
.text:0000000140001013 48C7459002000000           mov qword ptr [rbp-0x70], 2
.text:000000014000101b 48C7458803000000           mov qword ptr [rbp-0x78], 3
.text:0000000140001023 48C7458004000000           mov qword ptr [rbp-0x80], 4
.text:000000014000102b 4833C9                     xor rcx, rcx
.text:000000014000102e FF1560100000               call qword ptr [ExitProcess]
.text:0000000140001034 C9                         leave
.text:0000000140001035 C3                         ret

jj2007

Quote from: johnsa on March 25, 2017, 02:09:02 AMDo you have an ASM based example of how you'd handle dynamically allocating the stack ?

StackBuffer()


johnsa

Quote from: jj2007 on March 25, 2017, 03:21:02 AM
Quote from: johnsa on March 25, 2017, 02:09:02 AMDo you have an ASM based example of how you'd handle dynamically allocating the stack ?

StackBuffer()

That looks interesting! and I guess thats x64 as well as x86 ?

hutch--

There has never been a problem with LEAVE and it was the normal cleanup in 32 bit MASM where ENTER was known to be slow in 32 bit. With the size of 64 bit instructions generally being larger than the 32 bit versions, using ENTER does not seem to be a problem as any high level code is some powers slower than direct mnemonic code. With pure mnemonic code you would go for not using a stack frame as you total call overhead is simple CALL/RET.

jj2007

Quote from: johnsa on March 25, 2017, 03:28:42 AMThat looks interesting! and I guess thats x64 as well as x86 ?

No, StackBuffer() is 32-bit only so far.

Re enter+leave:
Quote from: jj2007 on July 27, 2016, 12:56:50 AM
Quote from: TWell on July 26, 2016, 12:00:30 AMEDIT: How about testing in x64 enter/leave and rsp sub/add ?

Saw your edit only now, sorry. If I remember well, we tested that for 32-bit code in the Lab; enter was slow, leave was fast.

P.S.: Made a few tests, and for a naked procedure, enter is about 15% slower than push rbp + mov rbp, rsp

Which means a cycle or so. As Hutch wrote above, if it's really speed critical, you would use only registers + CALL + RET.

And if you want it really fast, i.e. the extra cycle for enter slows your algo down, then your design is wrong. Short procedures in speed critical loops are nonsense, drop the call and the ret and use a macro, or "inline" it by hand.