Untill now I have used STACKFRAME macro but I must learn how to calculate shadow space. API parameters are ok to caluculate.
Calculate the locals in this example is puzzeling me.
Manually counting size of all locals = 454
neg 454 = FFFFFFFFFFFFFE3A
aligning 16 - 454 % 16 => +10 bytes => sub rsp,10
entry_point proc
local wsaData:WSADATA
local _rdi:qword
local _rsi:qword
local w:dword
local h:dword
local i:dword
local ipAddr[16]:byte
local lnght:qword
This function crashes because the stack is not aligned.
;=== Initiate Windows Sockets ===
lea rdx,wsaData
mov cx,202h
call WSAStartup
test rax,rax
Build with ml64 and listed
16: entry_point proc
00007FF713EE203C push rbp
00007FF713EE203D mov rbp,rsp
00007FF713EE2040 add rsp,0FFFFFFFFFFFFFE38h = -456
17: local wsaData:WSADATA
18: local _rdi:qword
19: local _rsi:qword
20: local w:dword
21: local h:dword
22: local i:dword
23: local ipAddr[16]:byte
24: local lnght:qword
25:
aligning 16 - 456 % 16 => +8 bytes => sub rsp,8 and the function is working.
WSADATA struct
wVersion dw ?
wHighVersion dw ?
iMaxSockets dw ?
iMaxUdpDg dw ?
lpVendorInfo dq ?
szDescription db WSADESCRIPTION_LEN+1 dup (?)
szSystemStatus db WSASYS_STATUS_LEN+1 dup (?)
WSADATA ends
Can anyone explain why I get 454 bytes when I calculate the size of locals and 456 when ml64 calculates.
With kind regards
hint: 456 is exactly divisible by 8 bytes
two byte padding added by ml64 to acheive 8 byte alignment
else the stack would be out of alignment
That is just a best guess
edited for clarity
WSADATA is 408 bytes
408 + 3 * 8 + 3 * 4 + 16 = 460
Quote from: TimoVJL on August 21, 2022, 04:37:15 PM
408 + 3 * 8 + 3 * 4 + 16 = 460
Shouldn't the stack remain aligned to 8 bytes?
That was my basic understanding.
456 / 8 = 57 :icon_idea:
460 / 8 = 57.5
I found this regarding stack alignment and shadow space:
Quote from: qWord on May 02, 2013, 07:12:01 AMyou must take care of stack: it must be aligned by 16 and the shadow space for the register arguments must be allocated...BTW2: there is no need to zero unused parameters.
Only variable space was counted, so next add align and an additional stack space for functions if needed.
PS: without optimization
MS C/C++ compiler puts totally 1E8h = 488 stack.
Clang 12 208h = 520
The general drift is if you start with an aligned stack, add LOCAL values from biggest to smallest in order.
Look to that with hexadecimal or binary eyes. A multiple of 16 means "ends with zero" in hexadecimal.
When your program is loaded in memory (entry point) rsp register will end with 8, like rsp=???????8h. To you be able to call a function rsp register should be aligned to 16, so rsp looks like rsp=???????0h before instruction call.
entry_point stack == ???????8h (not aligned)
rsp-8h= ???????0h (aligned)
Now when you think about this situation, you can allocate even numbers of qwords instead of odd numbers of qwords because stack is aligned. Well, if you need deal with dwords, so 4 of them will turn stack aligned.
entry:
sub rsp, 1*8 ;align stack
sub rsp, 4*8 ;stack continue aligned
;supposing qwords below:
mov rcx,rdx,r8,r9
push 1 ;stack not aligned (remember that push/call subtracts from rsp and pop/ret add)
push 2 ;stack aligned
push 3 ;stack not aligned
push foo ;stack aligned
call x ;this function needs 3 arguments in stack, we inserted a foo (even) to align stack
-----------------------------------------------------------
I was thinking in a dispatch function, but in my tests (linux) this was not efficient.
This code was not tested, I'm writing from mind without doing tests, but figure the idea.
eg:
rsp=???????8h
push 1 ;stack aligned ;rsp=???????0h
push 2 ;stack not aligned ;rsp=???????8h
mov rax, offset function_name
call dispatch
dispatch proc ;this procedure is supposed to align callers stack
mov rsi,rsp
and rsi,0000000fh
jnz @F
push offset return_value
@@:
jmp rax
return_value:
ret
dispatch endp
OK guys, I don't understand what you are writing about.
This is what I know. If the enter_point proc is a none-leaf process, I need to reserve space for the parameters. For example MessageBox has four parameters and I need to reserve 4*8 bytes (already aligned) plus 8 bytes for call return address. So sub rsp,40.
I have defined WSADATA and the size is 192h bytes. I have tested with mov rax,sizeof WSADATA so it is correct. In total, locals will be 454 bytes (not aligned), while the ml64.exe proc macro will make it 456 bytes (not aligned). What I wonder about is why there is a difference between my way of counting and ml64.exe's way of counting.
Or am I completely wrong
If you have the help file, masm64.chm, it handles what you have been asking. Forget push/call notation STDCALL, win64 uses its own flavour of FASTCALL, 4 registers then stack addresses for any further arguments. With ML64, if you set a procedure with a conventional argument list, masm creates the stack space for you. If you pass only by register with no argument list for the procedure, you get no stack space. Both have their virtues.
Quote from: minor28 on August 21, 2022, 11:02:34 PM
I have defined WSADATA and the size is 192h bytes.
64-bit WSADATA is 408 dec, 198h, so you missed aligning. That stucture isn't BYTE aligned, so try STRUCT 8, if it helps to get right size or use filler / padding in right place.
x64 struct info
WSADATA 408 198h bytes
wVersion +0h 2h
wHighVersion +2h 2h
iMaxSockets +4h 2h
iMaxUdpDg +6h 2h
lpVendorInfo +8h 8h
szDescription +10h 101h
szSystemStatus +111h 81h
For the original post, one more reference.
1. Start "masmhelp.exe"
2. Select "Reference"
3. Select "Win64 Calling Convention"
This is a detailed explanation of the win64 calling convention.
Now I have studied fastcall convention masmhelp etc. I have tested several variations on the below test program. As the program is now written, it works. What I don't understand is why extra1 and extra2 are taken from rbp+30h and not from rbp+20h. Is it the return address perhaps?
The code is written in visual studio 2022 community and I use vs libraries.
I would be grateful for comments and recommendations.
OPTION DOTNAME ; required for macro files
option casemap:none ; case sensitive
include \masm32\include64\win64.inc ; main include file
include \masm32\include64\kernel32.inc
include \masm32\include64\user32.inc
include \masm32\include64\Comctl32.inc
include \masm32\include64\ws2_32.inc
public entry_point
WSADATA struct qword
wVersion dw ?
wHighVersion dw ?
iMaxSockets dw ?
iMaxUdpDg dw ?
lpVendorInfo dq ?
szDescription db WSADESCRIPTION_LEN+1 dup (?)
szSystemStatus db WSASYS_STATUS_LEN+1 dup (?)
WSADATA ends
.data?
hInstance dq ?
buffer db 260 dup (?)
.data
szMyText db "My text",0
.code
entry_point proc
local w:char
local z[2]:HWND
local wsadata:WSADATA
sub rsp,2*8
mov w,'y'
lea rax,z
lea r10,szMyText
mov qword ptr [rax],r10
mov qword ptr [rax + 8],r10
xor rcx,rcx
call GetModuleHandle
mov hInstance,rax
mov cx,0202h
lea rdx,wsadata
call testfunction1
xor ecx,ecx
call ExitProcess
ret
entry_point endp
testfunction1 proc ver:word,pWsaData:qword
sub rsp,6*8
mov ver,cx
mov pWsaData,rdx
mov rdx,pWsaData
mov cx,ver
call WSAStartup
test rax,rax
je @F
mov rcx,NULL
lea rdx,szMyText
lea r8,szMyText
mov r9,MB_OK or MB_ICONERROR
call MessageBox
@@:
mov rcx,NULL
lea rdx,szMyText
lea r8,szMyText
mov r9,MB_OK or MB_ICONERROR
mov qword ptr [rsp + 20h],500
mov qword ptr [rsp + 28h],501
call MyMessage
add rsp,6*8
ret
testfunction1 endp
MyMessage proc ;hWin:HWND,pmes:qword,ptitle:qword,pIcon:qword,extra1:qword,extra2:qword
local hWin:HWND ;these four only for testing
local pmes:qword
local ptitle:qword
local pIcon:qword
local extra1:qword
local extra2:qword
sub rsp,4*8
lea rax,extra1
mov rax,qword ptr [rbp + 30h] ;? why not [rbp + 20h]
mov extra1,rax
mov rax,qword ptr [rbp + 38h] ;? why not [rbp + 28h]
mov extra2,rax
mov hWin,rcx ;these four only for testing
mov pmes,rdx
mov ptitle,r8
mov pIcon,r9
call MessageBox
lea rax,extra1
mov rax,extra1
add rax,extra2
add rsp,4*8
ret
MyMessage endp