The technique is very simple, when you call a leaf procedure with no stack frame, it enters the proc misaligned by 8 bytes. With pure mnemonic code, it does not matter but if you try and call other code, it fails due to the misalignment. Solution is simple, sub rsp by 8 bytes and in the leaf proc, set RET as "ret 8" and the leaf proc is aligned the same as the calling proc and you can make high level calls from within the leaf proc.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
STACKFRAME
entry_point proc
LOCAL ptxt :QWORD
sas ptxt, "This is a test of a new stack frame macro"
rcall MessageBox,0," RSP Before",str$(rsp),0
sub rsp, 8
rcall testproc,ptxt
rcall MessageBox,0," RSP After",str$(rsp),0
rcall ExitProcess,0
ret
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
testproc proc
rcall MessageBox,0,rcx,"Title",0
ret 8
testproc endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
I think it's better to always assume RSP is misaligned on any proc entry and adjust RSP within the proc. Using SUB RSP,8 and RET 8 seems counter-intuitive and means there are now two places to remember to add code.
On proc entry I always assume the stack is misaligned, with no stack frame a simple PUSH will align it (and save a few bytes).
push rbx ;aligns RSP to 16
;tricks
sub ebx,ebx ;can be useful as a zero byte/word/dword/qword
lea rcx,[rbx+4] ;same as mov ecx,4 but shorter
mov var1,rbx ;shorter than mov var1,0
;
pop rbx
ret
Any combination will work, sub before the call and add after it, inside the leaf proc a sub at the start and add before the ret and you can automate any variant with a macro, this one above was the simplest in terms of calling code but they all work.
This probably the easiest to use.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
; -------------------------
; maintain caller alignment
; -------------------------
calla MACRO args:VARARG
sub rsp, 8
rcall args
add rsp, 8
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
STACKFRAME
entry_point proc
LOCAL ptxt :QWORD
sas ptxt, "This is a test of a new stack frame macro"
rcall MessageBox,0," RSP Before",str$(rsp),0
calla testproc,ptxt
rcall MessageBox,0," RSP After",str$(rsp),0
rcall ExitProcess,0
ret
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
NOSTACKFRAME
testproc proc
rcall MessageBox,0,rcx,"Title",0
ret
testproc endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
I still think that alignment should happen within the called proc, use it if the proc calls others.
64-bit code can get "bloated", using the calla macro could result in code like this
sub rsp,8
rcall args
add rsp,8
sub rsp,8
rcall args
add rsp,8
...
:biggrin:
You might like this then. No manual stack twiddling at all.
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include64\masm64rt.inc
LeafProc MACRO procname, flag, argbytes, localbytes, reglist, userparms:VARARG
sub rsp, 8
EXITM <localbytes>
ENDM
EndLeaf MACRO procname, flag, argbytes, localbytes, reglist, userparms:VARARG
add rsp, 8
ret
ENDM
LEAFPROC MACRO
OPTION PROLOGUE:LeafProc
OPTION EPILOGUE:EndLeaf
ENDM
.code
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
entry_point proc
rcall testit
waitkey
rcall ExitProcess,0
entry_point endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
LEAFPROC
testit proc
rcall MessageBox,0,"Leaf Proc Here","About",MB_OK
ret
testit endp
STACKFRAME
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end
comment #
sub_140001033 proc
.text:0000000140001033 4883EC08 sub rsp, 8
.text:0000000140001037 49C7C100000000 mov r9, 0x0
.text:000000014000103e 4C8B0551100000 mov r8, qword ptr [0x140002096]
.text:0000000140001045 488B1563100000 mov rdx, qword ptr [0x1400020af]
.text:000000014000104c 4833C9 xor rcx, rcx
.text:000000014000104f FF15FF110000 call qword ptr [MessageBoxA]
.text:000000014000104f
.text:0000000140001055 4883C408 add rsp, 8
.text:0000000140001059 C3 ret
#
:biggrin:
I still dislike macros though...
LeafProc MACRO procname, flag, argbytes, localbytes, reglist, userparms:VARARG
push rcx
EXITM <localbytes>
ENDM
EndLeaf MACRO procname, flag, argbytes, localbytes, reglist, userparms:VARARG
pop rcx
ret
ENDM
more bloat :biggrin:
49C7C100000000 mov r9, 0x0
edited: changed rax to rcx because rax is used to return a value sometimes.
I will have a play with that macro when I get time, if its a "0" it does the xor reg, reg but if its a constant, you get mov reg, 0.
If you adjust rsp and do so as well with a call, then you no longer have a leaf function, and for exception handling purposes there should also be xdata and pdata as with non-leaf functions.
That's why I put the stack alterations in a prologue/epilogue, a stack frame proc provides more than correct alignment, LOCAL's etc .... the prologue/epilogue above does nothing more than align the stack pointer. It is properly an aligned no stack frame prologue/epilogue technique.