News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Stack size

Started by JK, March 16, 2022, 09:06:01 AM

Previous topic - Next topic

JK

Using rather complicated structures (struct ... ends) for local variables in a procedure resulted in a stack overflow. So doing some calculations i found that the default stack size of 1 MB (according to MS docs for their VS link.exe) should have been more than enough. Looking at the PE header i can see: SizeOfStackReserved = 0x100000, SizeOfStackCommitted = 0x1000, which conforms to the docs.

Increasing the stack size with the /STACK (/STACK:0x200000) option for link.exe doesn´t help either. But when i increase the committed size (/STACK:0x100000,0x100000) it runs as expected. Which kind of solves my problem, but leaves questions.

Shouldn´t the available stack size grow as needed automatically form the initially supplied committed size to a maximum of the reserved size? In other words: why do i get a GPF when writing to a local variable, which is outside the range of the initially committed stack space but by far within the range of totally reserved stack space? (as already mentioned this GPF goes away when i increase the committed stack size!)

Looking at other executables (e.g. MS´ link.exe) this 0x100000 / 0x1000 setting is widely used and obviously works - why doesn´t it work in mine? Did i miss a linker option to make this work? My current link options are: /SUBSYSTEM:... /RELEASE /LIBPATH:"..." /OUT:"..."

JK

jj2007

Is it really a stack overflow, or does it throw another exception?
Do you probe the stack? Usually, when your locals exceed around 8k, you get exceptions.

StackBuffer():
Quotedoes the stack probing for you; up to about half a megabyte, it is significantly faster than HeapAlloc

hutch--

JK,

Just make the stack size bigger in the linker options. If you have some idea of where the pressure on the stack memory is coming from in your app, see if you can work out by how much and just make the stack allocation big enough. Some time ago when working on sorting algorithms, a quick sort design would take off and crash the stack. I ended up putting a recursion limit so that if it went beserk, it would stop at a certain recursion depth.

Another alternative is to use allocated memory rather than stack memory but the would involve a rewrite.

TimoVJL

C compiler add _chkstk if locals are bigger than 4k/8k
Perhaps masm have similar routines
May the source be with you

jj2007

Timings for StackBuffer vs HeapAlloc, for a random #bytes between 0 and 512kBytes, zeroed and non-zeroed:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

1969    kCycles for 100 * StackBuffer()
2       kCycles for 100 * StackBuffer(nz)
101     kCycles for 100 * HeapAlloc generate exceptions
2176    kCycles for 100 * HeapAlloc zero mem

1977    kCycles for 100 * StackBuffer()
2       kCycles for 100 * StackBuffer(nz)
97      kCycles for 100 * HeapAlloc generate exceptions
2154    kCycles for 100 * HeapAlloc zero mem

1951    kCycles for 100 * StackBuffer()
2       kCycles for 100 * StackBuffer(nz)
96      kCycles for 100 * HeapAlloc generate exceptions
2162    kCycles for 100 * HeapAlloc zero mem

1953    kCycles for 100 * StackBuffer()
2       kCycles for 100 * StackBuffer(nz)
97      kCycles for 100 * HeapAlloc generate exceptions
2162    kCycles for 100 * HeapAlloc zero mem


#2 only probes the stack, without zeroing the buffer.

JK

Here is some sample code demonstrating my problem:
.DATA

TESTTYPE2 STRUCT
X1 BYTE ?
X2 BYTE ?
Y WORD 3 DUP (?)
Z SDWORD 6 * 6 DUP (?)
TESTTYPE2 ENDS


TESTTYPE3 STRUCT
X1 BYTE ?
X2 BYTE ?
;TT2 TESTTYPE2 70 DUP (<>)                             ; -> runs
TT2 TESTTYPE2 75 DUP (<>)                            ; -> GPF
TESTTYPE3 ENDS


.CODE

Start PROC
LOCAL L_TT2:TESTTYPE2
LOCAL L_TT3:TESTTYPE3

CLD 
XOR EAX, EAX                                          ;set L_TT3 to zero ...
MOV ECX, SIZEOF(TESTTYPE3)
LEA EDI, L_TT3
REP STOSB                                             ;GPF here, if any

int 3

RET
Start ENDP


END Start


the debugger shows a size of 0x2C8A (= 11402 bytes) for L_TT3, which causes an access violation at "REP STOSB". Reducing the size of TESTTYPE3 makes this go away.

REP STOSB writes to stack memory (i.e. zeros TESTTYPE3). According to the PE header 1MB of stack memory is reserved and initially 4KB committed, so this should be more than sufficient - but for some reason it isn´t!

A size of about 11000 bytes makes it crash, a size of about 10000 bytes is acceptable (strange enough the exact size, from which on it crashes, seems to vary from run to run). 10000 is more than 4K, so there must have been committed more stack memory than these initial 4K, but when it comes to 11000 it keeps crashing.

As already mentioned i could add /STACK:0x100000,0x100000 to the linker options, which always commits 1MB to the stack (even if it is not needed). But looking at other executables i almost every time see the defaults (1MB reserved and 4KB committed), which obviously work there. Why does my executable not work with these defaults?


jj2007

Google for stack guard pages access violation

tenkey

You need to use STD instead of CLD for Descending mode and start from the highest address.
It is best to immediately CLD after the REP STOB because Win32 and Win64 calls expect DF to be cleared (they won't clear it if ascending moves is all they do).

JK

So i should zero out from top to bottom instead of bottom to top. This way i hit these guard pages as they appear and i cannot inadvertedly skip one, which is the reason of GPF in in my case - i see.

Thanks

JK

daydreamer

for calculations ,create a workerthread,lets you set bigger than 1mb separate stack for it
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Vortex

Hi Timo,

Here is the allocmem function derived from the Pelles C run-time library :

include Demo.inc

.data

string  db 'This is a test.',0
string2 db '%s',13,10
        db 'Address of the string = %Xh',0

.code

start:

    call    main
    invoke  ExitProcess,0

main PROC

LOCAL pMem:DWORD

; Allocate 10 Mb in the stack

    invoke  allocmem,10240

    mov     pMem,esp

    invoke  szCopy,ADDR string,esp

    invoke  printf,ADDR string2,pMem,pMem

    add     esp,10240

    ret

main ENDP

END start

jj2007

Quote from: daydreamer on March 17, 2022, 04:55:39 AM
for calculations ,create a workerthread,lets you set bigger than 1mb separate stack for it

Interesting, can you post some code?

daydreamer

Quote from: jj2007 on March 17, 2022, 08:04:48 AM
Quote from: daydreamer on March 17, 2022, 04:55:39 AM
for calculations ,create a workerthread,lets you set bigger than 1mb separate stack for it

Interesting, can you post some code?
threadst equ 20000000 ;threadstack size ca 20MB

.data
varx dd 0,0,0,0
thread1 dd 0,0,0,0 ;thread Id's
threadcnt dd 0
.code
inc threadcnt
mov esi,threadcnt
;third argument = thread stack size or if you want default 1mb = NULL here
;fourth argument ADDR of your workerthread
     mov thread1, rv(CreateThread,NULL,threadst,ADDR workerthread,esi,NULL,ADDR varx)


my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

Oops, that chokes with lots of assembly errors. Did you test it? Can you post the complete code, please?

HSE

Quote from: daydreamer on March 18, 2022, 01:55:33 AM
threadst equ 20000000 ;threadstack size ca 20MB
     mov thread1, rv(CreateThread,NULL,threadst,ADDR workerthread,esi,NULL,ADDR varx)



And now  function workerthread can use 20 MB as locals?
Equations in Assembly: SmplMath