a handy little function :P
MemStrategy PROC nBytes:DWORD
;Determine method of memory allocation
;DednDave, 10-2013
;Allocation is attempted in the following order:
; 1) stack
; 2) heap
; 3) not available - the caller may use VirtualAlloc or otherwise alter strategy
;-----------------------------------
;Call With: nBytes = number of bytes requested
;
; Returns: If requested allocation is available on stack:
; EAX = 0
; ECX = total available stack space, less reserved bytes
; EDX = current StackLimit, from FS:[8]
;
; If requested allocation is available on heap:
; EAX = address of allocated block (block is zeroed and must be freed)
; ECX = 0
; EDX = hHeap, process heap handle
;
; If requested allocation not available on stack or heap:
; EAX = 0
; ECX = 0
; EDX = hHeap, process heap handle
;-----------------------------------
_StackReserved = 4096
mov edx,nBytes
lea ecx,[esp+8-_StackReserved]
.if edx<=ecx
neg edx
xor eax,eax
lea edx,[edx+esp+8]
ASSUME FS:Nothing
.repeat
push eax
mov esp,fs:[8]
.until edx>=esp
ASSUME FS:ERROR
mov edx,esp
lea esp,[ecx+_StackReserved-8]
.else
push edx
INVOKE GetProcessHeap
pop edx
push eax
INVOKE HeapAlloc,eax,HEAP_ZERO_MEMORY,edx
pop edx
xor ecx,ecx
.endif
ret
MemStrategy ENDP
Dave,
Quote from: dedndave on October 25, 2013, 02:03:44 AM
a handy little function :P
yes, a nice and handy tool. :t Thank you for providing the code.
Gunther
Dave,
If the requested allocation is available on the stack, shouldn't eax return the pointer to the buffer, like StackBuffer() (http://masm32.com/board/index.php?topic=94.msg22404#msg22404)?
We should also test the speed of n*push 0 vs n*stosd ;)
push edi
push ecx
mov edi, esp
mov ecx, bufsize/4
xor eax, eax
std
rep stosd
mov esp, edi
add esp, bufsize ; release stackbuffer
pop ecx
pop edi
cld
push ecx
lea eax, [esp-bufsize]
align 4
.Repeat
push 0
.Until esp<=eax
add esp, bufsize
pop ecx
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
+19 of 20 tests valid, loop overhead is approx. 17/10 cycles
471079 cycles for 10 * rep stosd
500744 cycles for 10 * push 0
501473 cycles for 10 * push edx
471304 cycles for 10 * rep stosd
500729 cycles for 10 * push 0
501447 cycles for 10 * push edx
470335 cycles for 10 * rep stosd
500717 cycles for 10 * push 0
501181 cycles for 10 * push edx
22 bytes for rep stosd
18 bytes for push 0
21 bytes for push edx
the way i use it, it simply probes the stack
it does not allocate that space - but it tells the caller the stack is there and ready
i did it that way because - sometimes he wants it initialized - sometimes not
for the heap allocation - you don't get a second chance
and - HEAP_ZERO_MEMORY is pretty fast, as i recall
as for PUSH 0 - i think PUSH immed is slower than PUSH EAX (EAX = 0)
but - i think REP STOSD is still faster
probably the fastest is a discrete loop
anyways - the code is there - modify it to suit your needs on a program-by-program basis :t
Quote from: dedndave on October 25, 2013, 07:52:10 AM
as for PUSH 0 - i think PUSH immed is slower than PUSH EAX (EAX = 0)
but - i think REP STOSD is still faster
See new timings and attachment above.
Quoteprobably the fastest is a discrete loop
What do you mean?
what i mean is - SUB ESP, something, then.....
xor eax,eax
loop00: mov [edi],eax
mov [edi+4],eax
mov [edi+8],eax
mov [edi+12],eax
dec ecx
lea edi,[edi+16]
jnz loop00
or something similar
you could even
xor eax,eax
loop00: sub esp,16
dec ecx
mov [esp],eax
mov [esp+4],eax
mov [esp+8],eax
mov [esp+12],eax
jnz loop00