News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

MemStrategy

Started by dedndave, October 25, 2013, 02:03:44 AM

Previous topic - Next topic

dedndave

a handy little function   :P
MemStrategy PROC nBytes:DWORD

;Determine method of memory allocation
;DednDave, 10-2013

;Allocation is attempted in the following order:
;    1) stack
;    2) heap
;    3) not available - the caller may use VirtualAlloc or otherwise alter strategy

;-----------------------------------

;Call With: nBytes = number of bytes requested
;
;  Returns: If requested allocation is available on stack:
;               EAX = 0
;               ECX = total available stack space, less reserved bytes
;               EDX = current StackLimit, from FS:[8]
;
;           If requested allocation is available on heap:
;               EAX = address of allocated block (block is zeroed and must be freed)
;               ECX = 0
;               EDX = hHeap, process heap handle
;
;           If requested allocation not available on stack or heap:
;               EAX = 0
;               ECX = 0
;               EDX = hHeap, process heap handle

;-----------------------------------

_StackReserved = 4096

    mov     edx,nBytes
    lea     ecx,[esp+8-_StackReserved]
    .if edx<=ecx
        neg     edx
        xor     eax,eax
        lea     edx,[edx+esp+8]

        ASSUME  FS:Nothing

        .repeat
            push    eax
            mov     esp,fs:[8]
        .until edx>=esp

        ASSUME  FS:ERROR

        mov     edx,esp
        lea     esp,[ecx+_StackReserved-8]
    .else
        push    edx
        INVOKE  GetProcessHeap
        pop     edx
        push    eax
        INVOKE  HeapAlloc,eax,HEAP_ZERO_MEMORY,edx
        pop     edx
        xor     ecx,ecx
    .endif
    ret

MemStrategy ENDP


Gunther

Dave,

Quote from: dedndave on October 25, 2013, 02:03:44 AM
a handy little function   :P

yes, a nice and handy tool.  :t Thank you for providing the code.

Gunther
You have to know the facts before you can distort them.

jj2007

#2
Dave,

If the requested allocation is available on the stack, shouldn't eax return the pointer to the buffer, like StackBuffer()?

We should also test the speed of n*push 0 vs n*stosd ;)

        push edi
        push ecx
        mov edi, esp
        mov ecx, bufsize/4
        xor eax, eax
        std
        rep stosd
        mov esp, edi
        add esp, bufsize                ; release stackbuffer
        pop ecx
        pop edi
        cld

        push ecx
        lea eax, [esp-bufsize]
        align 4
        .Repeat
                push 0
        .Until esp<=eax
        add esp, bufsize
        pop ecx

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
+19 of 20 tests valid, loop overhead is approx. 17/10 cycles

471079  cycles for 10 * rep stosd
500744  cycles for 10 * push 0
501473  cycles for 10 * push edx

471304  cycles for 10 * rep stosd
500729  cycles for 10 * push 0
501447  cycles for 10 * push edx

470335  cycles for 10 * rep stosd
500717  cycles for 10 * push 0
501181  cycles for 10 * push edx

22      bytes for rep stosd
18      bytes for push 0
21      bytes for push edx

dedndave

the way i use it, it simply probes the stack
it does not allocate that space - but it tells the caller the stack is there and ready
i did it that way because - sometimes he wants it initialized - sometimes not
for the heap allocation - you don't get a second chance
and - HEAP_ZERO_MEMORY is pretty fast, as i recall

as for PUSH 0 - i think PUSH immed is slower than PUSH EAX (EAX = 0)
but - i think REP STOSD is still faster

probably the fastest is a discrete loop

anyways - the code is there - modify it to suit your needs on a program-by-program basis   :t

jj2007

Quote from: dedndave on October 25, 2013, 07:52:10 AM
as for PUSH 0 - i think PUSH immed is slower than PUSH EAX (EAX = 0)
but - i think REP STOSD is still faster

See new timings and attachment above.

Quoteprobably the fastest is a discrete loop

What do you mean?

dedndave

what i mean is - SUB ESP, something, then.....

        xor     eax,eax

loop00: mov     [edi],eax
        mov     [edi+4],eax
        mov     [edi+8],eax
        mov     [edi+12],eax
        dec     ecx
        lea     edi,[edi+16]
        jnz     loop00


or something similar

you could even
        xor     eax,eax

loop00: sub     esp,16
        dec     ecx
        mov     [esp],eax
        mov     [esp+4],eax
        mov     [esp+8],eax
        mov     [esp+12],eax
        jnz     loop00