Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Shadow space for the locals

Started by minor28, August 21, 2022, 04:13:54 PM

Previous topic - Next topic


Untill now I have used STACKFRAME macro but I must learn how to calculate shadow space. API parameters are ok to caluculate.

Calculate the locals in this example is puzzeling me.

Manually counting size of all locals = 454
aligning 16 - 454 % 16 => +10 bytes => sub rsp,10

entry_point proc
    local wsaData:WSADATA
    local _rdi:qword
    local _rsi:qword
    local w:dword
    local h:dword
    local i:dword
    local ipAddr[16]:byte
    local lnght:qword

This function crashes because the stack is not aligned.

;=== Initiate Windows Sockets ===
lea rdx,wsaData
mov cx,202h
call WSAStartup
test rax,rax

Build with ml64 and listed

    16: entry_point proc
00007FF713EE203C  push        rbp 
00007FF713EE203D  mov         rbp,rsp 
00007FF713EE2040  add         rsp,0FFFFFFFFFFFFFE38h = -456
    17:     local wsaData:WSADATA
    18:     local _rdi:qword
    19:     local _rsi:qword
    20:     local w:dword
    21:     local h:dword
    22:     local i:dword
    23:     local ipAddr[16]:byte
    24:     local lnght:qword

aligning 16 - 456 % 16 => +8 bytes => sub rsp,8 and the function is working.

WSADATA struct
    wVersion            dw ?
    wHighVersion        dw ?
    iMaxSockets         dw ?
    iMaxUdpDg           dw ?
    lpVendorInfo        dq ?
    szDescription       db WSADESCRIPTION_LEN+1 dup (?)
    szSystemStatus      db WSASYS_STATUS_LEN+1 dup (?)

Can anyone explain why I get 454 bytes when I calculate the size of locals and 456 when ml64 calculates.

With kind regards


hint: 456 is exactly divisible by 8 bytes

two byte padding added by ml64 to acheive 8 byte alignment

else the stack would be out of alignment

That is just a best guess
edited for clarity


WSADATA is 408 bytes

408 + 3 * 8 + 3 * 4 + 16 = 460
May the source be with you


Quote from: TimoVJL on August 21, 2022, 04:37:15 PM
408 + 3 * 8 + 3 * 4 + 16 = 460

Shouldn't the stack remain aligned to 8 bytes?
That was my basic understanding.

456 / 8 = 57   :icon_idea:
460 / 8 = 57.5

I found this regarding stack alignment and shadow space:
Quote from: qWord on May 02, 2013, 07:12:01 AMyou must take care of stack: it must be aligned by 16 and the shadow space for the register arguments must be allocated...BTW2: there is no need to zero unused parameters.


Only variable space was counted, so next add align and an additional stack space for functions if needed.

PS: without optimization
MS C/C++ compiler puts totally 1E8h = 488 stack.
Clang 12 208h = 520
May the source be with you


The general drift is if you start with an aligned stack, add LOCAL values from biggest to smallest in order.


Look to that with hexadecimal or binary eyes. A multiple of 16 means "ends with zero" in hexadecimal.
When your program is loaded in memory (entry point) rsp register will end with 8, like rsp=???????8h. To you be able to call a function rsp register should be aligned to 16, so rsp looks like rsp=???????0h before instruction call.

entry_point stack == ???????8h (not aligned)
rsp-8h= ???????0h (aligned)
Now when you think about this situation, you can allocate even numbers of qwords instead of odd numbers of qwords because stack is aligned. Well, if you need deal with dwords, so 4 of them will turn stack aligned.

sub rsp, 1*8    ;align stack
sub rsp, 4*8    ;stack continue aligned

;supposing qwords below:
mov rcx,rdx,r8,r9
push 1          ;stack not aligned (remember that push/call subtracts from rsp and pop/ret add)
push 2          ;stack aligned
push 3          ;stack not aligned
push foo        ;stack aligned
call x          ;this function needs 3 arguments in stack, we inserted a foo (even) to align stack

I was thinking in a dispatch function, but in my tests (linux) this was not efficient.
This code was not tested, I'm writing from mind without doing tests, but figure the idea.

push 1          ;stack aligned          ;rsp=???????0h
push 2          ;stack not aligned      ;rsp=???????8h
mov rax, offset function_name
call dispatch

dispatch proc           ;this procedure is supposed to align callers stack
mov rsi,rsp
and rsi,0000000fh
jnz @F
push offset return_value
jmp rax

dispatch endp
I'd rather be this ambulant metamorphosis than to have that old opinion about everything


OK guys, I don't understand what you are writing about.

This is what I know. If the enter_point proc is a none-leaf process, I need to reserve space for the parameters. For example MessageBox has four parameters and I need to reserve 4*8 bytes (already aligned) plus 8 bytes for call return address. So sub rsp,40.

I have defined WSADATA and the size is 192h bytes. I have tested with mov rax,sizeof WSADATA so it is correct. In total, locals will be 454 bytes (not aligned), while the ml64.exe proc macro will make it 456 bytes (not aligned). What I wonder about is why there is a difference between my way of counting and ml64.exe's way of counting.

Or am I completely wrong


If you have the help file, masm64.chm, it handles what you have been asking. Forget push/call notation STDCALL, win64 uses its own flavour of FASTCALL, 4 registers then stack addresses for any further arguments. With ML64, if you set a procedure with a conventional argument list, masm creates the stack space for you. If you pass only by register with no argument list for the procedure, you get no stack space. Both have their virtues.


Quote from: minor28 on August 21, 2022, 11:02:34 PM
I have defined WSADATA and the size is 192h bytes.
64-bit WSADATA is 408 dec, 198h, so you missed aligning. That stucture isn't BYTE aligned, so try STRUCT 8, if it helps to get right size or use filler / padding in right place.
x64 struct info
WSADATA          408 198h bytes
wVersion         +0h 2h
wHighVersion     +2h 2h
iMaxSockets      +4h 2h
iMaxUdpDg        +6h 2h
lpVendorInfo     +8h 8h
szDescription    +10h 101h
szSystemStatus   +111h 81h
May the source be with you


For the original post, one more reference.

1. Start "masmhelp.exe"
2. Select "Reference"
3. Select "Win64 Calling Convention"

This is a detailed explanation of the win64 calling convention.


Now I have studied fastcall convention masmhelp etc. I have tested several variations on the below test program. As the program is now written, it works. What I don't understand is why extra1 and extra2 are taken from rbp+30h and not from rbp+20h. Is it the return address perhaps?

The code is written in visual studio 2022 community and I use vs libraries.

I would be grateful for comments and recommendations.

OPTION DOTNAME                          ; required for macro files
option casemap:none                     ; case sensitive

include \masm32\include64\     ; main include file

include \masm32\include64\
include \masm32\include64\
include \masm32\include64\

include \masm32\include64\

public entry_point

WSADATA struct qword
    wVersion            dw ?
    wHighVersion        dw ?
    iMaxSockets         dw ?
    iMaxUdpDg           dw ?
    lpVendorInfo        dq ?
    szDescription       db WSADESCRIPTION_LEN+1 dup (?)
    szSystemStatus      db WSASYS_STATUS_LEN+1 dup (?)

hInstance dq ?
buffer db 260 dup (?)

szMyText db "My text",0


entry_point proc
local w:char
    local z[2]:HWND
    local wsadata:WSADATA

    sub rsp,2*8

    mov w,'y'
    lea rax,z
    lea r10,szMyText
    mov qword ptr [rax],r10
    mov qword ptr [rax + 8],r10

xor rcx,rcx
    call GetModuleHandle
    mov hInstance,rax

    mov cx,0202h
    lea rdx,wsadata
    call testfunction1

    xor ecx,ecx
    call ExitProcess

entry_point endp

testfunction1 proc ver:word,pWsaData:qword

    sub rsp,6*8
    mov ver,cx
    mov pWsaData,rdx

    mov rdx,pWsaData
    mov cx,ver
    call WSAStartup
test rax,rax
    je @F
        mov rcx,NULL
        lea rdx,szMyText
        lea r8,szMyText
        mov r9,MB_OK or MB_ICONERROR
        call MessageBox

    mov rcx,NULL
    lea rdx,szMyText
    lea r8,szMyText
    mov r9,MB_OK or MB_ICONERROR
    mov qword ptr [rsp + 20h],500
    mov qword ptr [rsp + 28h],501
    call MyMessage

    add rsp,6*8
testfunction1 endp

MyMessage proc ;hWin:HWND,pmes:qword,ptitle:qword,pIcon:qword,extra1:qword,extra2:qword
local hWin:HWND ;these four only for testing
    local pmes:qword
    local ptitle:qword
    local pIcon:qword
    local extra1:qword
    local extra2:qword

    sub rsp,4*8

    lea rax,extra1
    mov rax,qword ptr [rbp + 30h]   ;? why not [rbp + 20h]
    mov extra1,rax
    mov rax,qword ptr [rbp + 38h]   ;? why not [rbp + 28h]
    mov extra2,rax

    mov hWin,rcx ;these four only for testing
    mov pmes,rdx
    mov ptitle,r8
    mov pIcon,r9
    call MessageBox

    lea rax,extra1
    mov rax,extra1
    add rax,extra2

    add rsp,4*8
MyMessage endp