Using rather complicated structures (struct ... ends) for local variables in a procedure resulted in a stack overflow. So doing some calculations i found that the default stack size of 1 MB (according to MS docs for their VS link.exe) should have been more than enough. Looking at the PE header i can see: SizeOfStackReserved = 0x100000, SizeOfStackCommitted = 0x1000, which conforms to the docs.
Increasing the stack size with the /STACK (/STACK:0x200000) option for link.exe doesn´t help either. But when i increase the committed size (/STACK:0x100000,0x100000) it runs as expected. Which kind of solves my problem, but leaves questions.
Shouldn´t the available stack size grow as needed automatically form the initially supplied committed size to a maximum of the reserved size? In other words: why do i get a GPF when writing to a local variable, which is outside the range of the initially committed stack space but by far within the range of totally reserved stack space? (as already mentioned this GPF goes away when i increase the committed stack size!)
Looking at other executables (e.g. MS´ link.exe) this 0x100000 / 0x1000 setting is widely used and obviously works - why doesn´t it work in mine? Did i miss a linker option to make this work? My current link options are: /SUBSYSTEM:... /RELEASE /LIBPATH:"..." /OUT:"..."
JK
Is it really a stack overflow, or does it throw another exception?
Do you probe the stack? Usually, when your locals exceed around 8k, you get exceptions.
StackBuffer(): (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1255)
Quotedoes the stack probing for you; up to about half a megabyte, it is significantly faster than HeapAlloc
JK,
Just make the stack size bigger in the linker options. If you have some idea of where the pressure on the stack memory is coming from in your app, see if you can work out by how much and just make the stack allocation big enough. Some time ago when working on sorting algorithms, a quick sort design would take off and crash the stack. I ended up putting a recursion limit so that if it went beserk, it would stop at a certain recursion depth.
Another alternative is to use allocated memory rather than stack memory but the would involve a rewrite.
C compiler add _chkstk if locals are bigger than 4k/8k
Perhaps masm have similar routines
Timings for StackBuffer vs HeapAlloc, for a random #bytes between 0 and 512kBytes, zeroed and non-zeroed:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)
1969 kCycles for 100 * StackBuffer()
2 kCycles for 100 * StackBuffer(nz)
101 kCycles for 100 * HeapAlloc generate exceptions
2176 kCycles for 100 * HeapAlloc zero mem
1977 kCycles for 100 * StackBuffer()
2 kCycles for 100 * StackBuffer(nz)
97 kCycles for 100 * HeapAlloc generate exceptions
2154 kCycles for 100 * HeapAlloc zero mem
1951 kCycles for 100 * StackBuffer()
2 kCycles for 100 * StackBuffer(nz)
96 kCycles for 100 * HeapAlloc generate exceptions
2162 kCycles for 100 * HeapAlloc zero mem
1953 kCycles for 100 * StackBuffer()
2 kCycles for 100 * StackBuffer(nz)
97 kCycles for 100 * HeapAlloc generate exceptions
2162 kCycles for 100 * HeapAlloc zero mem
#2 only probes the stack, without zeroing the buffer.
Here is some sample code demonstrating my problem:
.DATA
TESTTYPE2 STRUCT
X1 BYTE ?
X2 BYTE ?
Y WORD 3 DUP (?)
Z SDWORD 6 * 6 DUP (?)
TESTTYPE2 ENDS
TESTTYPE3 STRUCT
X1 BYTE ?
X2 BYTE ?
;TT2 TESTTYPE2 70 DUP (<>) ; -> runs
TT2 TESTTYPE2 75 DUP (<>) ; -> GPF
TESTTYPE3 ENDS
.CODE
Start PROC
LOCAL L_TT2:TESTTYPE2
LOCAL L_TT3:TESTTYPE3
CLD
XOR EAX, EAX ;set L_TT3 to zero ...
MOV ECX, SIZEOF(TESTTYPE3)
LEA EDI, L_TT3
REP STOSB ;GPF here, if any
int 3
RET
Start ENDP
END Start
the debugger shows a size of 0x2C8A (= 11402 bytes) for L_TT3, which causes an access violation at "REP STOSB". Reducing the size of TESTTYPE3 makes this go away.
REP STOSB writes to stack memory (i.e. zeros TESTTYPE3). According to the PE header 1MB of stack memory is reserved and initially 4KB committed, so this should be more than sufficient - but for some reason it isn´t!
A size of about 11000 bytes makes it crash, a size of about 10000 bytes is acceptable (strange enough the exact size, from which on it crashes, seems to vary from run to run). 10000 is more than 4K, so there must have been committed more stack memory than these initial 4K, but when it comes to 11000 it keeps crashing.
As already mentioned i could add /STACK:0x100000,0x100000 to the linker options, which always commits 1MB to the stack (even if it is not needed). But looking at other executables i almost every time see the defaults (1MB reserved and 4KB committed), which obviously work there. Why does my executable not work with these defaults?
Google for stack guard pages access violation
You need to use STD instead of CLD for Descending mode and start from the highest address.
It is best to immediately CLD after the REP STOB because Win32 and Win64 calls expect DF to be cleared (they won't clear it if ascending moves is all they do).
So i should zero out from top to bottom instead of bottom to top. This way i hit these guard pages as they appear and i cannot inadvertedly skip one, which is the reason of GPF in in my case - i see.
Thanks
JK
for calculations ,create a workerthread,lets you set bigger than 1mb separate stack for it
Hi Timo,
Here is the allocmem function derived from the Pelles C run-time library :
include Demo.inc
.data
string db 'This is a test.',0
string2 db '%s',13,10
db 'Address of the string = %Xh',0
.code
start:
call main
invoke ExitProcess,0
main PROC
LOCAL pMem:DWORD
; Allocate 10 Mb in the stack
invoke allocmem,10240
mov pMem,esp
invoke szCopy,ADDR string,esp
invoke printf,ADDR string2,pMem,pMem
add esp,10240
ret
main ENDP
END start
Quote from: daydreamer on March 17, 2022, 04:55:39 AM
for calculations ,create a workerthread,lets you set bigger than 1mb separate stack for it
Interesting, can you post some code?
Quote from: jj2007 on March 17, 2022, 08:04:48 AM
Quote from: daydreamer on March 17, 2022, 04:55:39 AM
for calculations ,create a workerthread,lets you set bigger than 1mb separate stack for it
Interesting, can you post some code?
threadst equ 20000000 ;threadstack size ca 20MB
.data
varx dd 0,0,0,0
thread1 dd 0,0,0,0 ;thread Id's
threadcnt dd 0
.code
inc threadcnt
mov esi,threadcnt
;third argument = thread stack size or if you want default 1mb = NULL here
;fourth argument ADDR of your workerthread
mov thread1, rv(CreateThread,NULL,threadst,ADDR workerthread,esi,NULL,ADDR varx)
Oops, that chokes with lots of assembly errors. Did you test it? Can you post the complete code, please?
Quote from: daydreamer on March 18, 2022, 01:55:33 AM
threadst equ 20000000 ;threadstack size ca 20MB
mov thread1, rv(CreateThread,NULL,threadst,ADDR workerthread,esi,NULL,ADDR varx)
And now function workerthread can use 20 MB as locals?