The MASM Forum

General => The Campus => Topic started by: frktons on December 23, 2012, 05:23:05 AM

Title: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 05:23:05 AM
Hi friends. Happy Holydays.

I'm trying to understand if it is possible, when allocating memory, to
have it aligned by 4-8-16. I didn't find any exaustive info so far.
And the second point I still don't have clear in my mind is when is it
important to deallocate/free the previously allocated memory?

Thanks for your help.

Frank
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: dedndave on December 23, 2012, 05:53:18 AM
it's already aligned
i think by 16

if it were not...
you could allocate 15 extra bytes
store the returned address to be used by the Free function
then ADD 15 to it, then AND it with 0FFFFFFF0h and use that as an aligned access address
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 06:02:45 AM
Quote from: dedndave on December 23, 2012, 05:53:18 AM
it's already aligned
i think by 16

if it were not...
you could allocate 15 extra bytes
store the returned address to be used by the Free function
then ADD 15 to it, then AND it with 0FFFFFFF0h and use that as an aligned access address

According to MSDN:
Quote
http://msdn.microsoft.com/en-us/library/aa366574(v=vs.85).aspx

says, inter alia


Memory allocated with this function is guaranteed to be aligned on an 8-byte boundary.
To execute dynamically generated code, use the VirtualAlloc function to allocate memory
and the VirtualProtect function to grant PAGE_EXECUTE access.



It seems the standard allocation is 8 aligned. And for 16 alignment
it is probably needed to use something like the one you suggested.
I'll make some tests with GlobalAlloc/HeapAlloc and see what alignment
I get.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Donkey on December 23, 2012, 06:03:15 AM
GlobalAlloc is aligned on 8 byte boundaries.

// Sorry frktons, I was typing while you answered....
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 06:09:36 AM
Quote from: Donkey on December 23, 2012, 06:03:15 AM
GlobalAlloc is aligned on 8 byte boundaries.

// Sorry frktons, I was typing while you answered....

Thanks. No problem Edgar.
I recall that in an old program, we used GlobalAlloc to allocate
16 MB and after we accessed it with SSE2 instructions and XMM registers
that require 16 bytes alignment.
And it worked. I'm wondering why, then, it worked?
Shouldn't it throw a General Protection Fault?


      invoke GlobalAlloc,GMEM_ZEROINIT or GMEM_FIXED,16*1024*1024
      mov DataPtr, eax   
.....
mov edx, DataPtr
lea ecx, [edx+16000000]
mov eax, 20202020h
movd xmm0, eax
pshufd xmm0, xmm0, 0
                movdqa xmm1, xmm0
                movdqa xmm2, xmm0
                movdqa xmm3, xmm0
                movdqa xmm4, xmm0                                     

@@:
movdqa [edx], xmm0
movdqa [edx + 16], xmm1
movdqa [edx + 32], xmm2
movdqa [edx + 48], xmm3
movdqa [edx + 64], xmm4
add edx, 80
cmp edx, ecx
jl @B

Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 06:41:12 AM
OK. The test shows that the GlobalAlloc() API uses a 16 bytes
boundaries to allocate memory, at least on my system and
with a size of 16 MB.

;-----------------------------------------------------
; TestAlloc.asm
;-----------------------------------------------------
; Test 10 times the GlobalAlloc API and displays the
; alignment of allocated memory blocks
; frktons - 22-dec-2012
;-----------------------------------------------------

.nolist
include \masm32\include\masm32rt.inc
.686

.DATA

    DataPtr  dd 0

.CODE

start:

    mov ecx, 10

alloc_again:

    push ecx

    invoke GlobalAlloc,GMEM_ZEROINIT or GMEM_FIXED,16*1024*1024
    mov DataPtr, eax 

    and eax, 15

    .IF ( eax == 0)

        print " The allocated memory is 16 bytes aligned", 13, 10
        jmp  next_alloc

    .ENDIF

    mov eax, DataPtr
    and eax, 7

    .IF ( eax == 0)

        print " The allocated memory is 8 bytes aligned", 13, 10

    .ENDIF

next_alloc:

    invoke GlobalFree, DataPtr

    pop  ecx
    dec  ecx
    jnz  alloc_again   
   
end_test:
    inkey chr$(13, 10, "--- ok ---", 13)
    exit

end start


Quote
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned

--- ok ---

Maybe if the memory to allocate is not a multiple of 16, it
uses the default 8 bytes alignment.
Another test is needed, I think.

If you test it on your system, would you confirm this data?
Thanks
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Donkey on December 23, 2012, 06:43:48 AM
Hi frktons,

The docs only guarantee 8 byte alignment, that doesn't necessarily mean it will be 8 bytes, just that you cannot assume it will be greater than that. You're best to stay away from GlobalAlloc for anything that requires alignment anyway, VirtualAlloc is a faster and more flexible function.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 06:52:21 AM
Quote from: Donkey on December 23, 2012, 06:43:48 AM
Hi frktons,

The docs only guarantee 8 byte alignment, that doesn't necessarily mean it will be 8 bytes, just that you cannot assume it will be greater than that. You're best to stay away from GlobalAlloc for anything that requires alignment anyway, VirtualAlloc is a faster and more flexible function.

Yes Edgar, that's probably a wise advice. Anyway if you make some tests
with Your favorite API please give me a shot.

I used a 1024*1024+15 bytes quantity to allocate, but I add the same result.
This make me suspect there is something else that the docs don't say.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: jj2007 on December 23, 2012, 07:37:04 AM
VirtualAlloc guarantees a 16-byte (actually: 4096-byte) alignment, but it is much slower than HeapAlloc for small allocations. Use HeapAlloc for many small non-SSE2 allocs and VirtualAlloc for the rest.

GlobalAlloc follows basically the same strategy, that's why you most probably get 16-byte alignment for fat chunks. But it is not documented...
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 09:10:40 AM
Quote from: jj2007 on December 23, 2012, 07:37:04 AM
VirtualAlloc guarantees a 16-byte (actually: 4096-byte) alignment, but it is much
slower than HeapAlloc for small allocations. Use HeapAlloc for many small
non-SSE2 allocs and VirtualAlloc for the rest.

GlobalAlloc follows basically the same strategy, that's why you most probably get
16-byte alignment for fat chunks. But it is not documented...

Thanks Jochen.

How many KB is a small allocation?
I've read in MSDN that HeapAlloc is preferable to GlobalAlloc:
Quote
Note  The global functions have greater overhead and provide fewer
features than other memory management functions. New applications
should use the heap functions unless documentation states that a
global function should be used. For more information, see Global and
Local Functions.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: hutch-- on December 23, 2012, 09:21:34 AM
Frank,

The only safe way with allocated memory is to allocate more than you need then align the front and set a pointer to it. Then you don't have to guess or hope that it will have the alignment you require on different OS versions.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: jj2007 on December 23, 2012, 09:34:52 AM
Quote from: frktons on December 23, 2012, 09:10:40 AM
How many KB is a small allocation?
I've read in MSDN that HeapAlloc is preferable to GlobalAlloc

We have made up a testbed some time ago, and the results are somewhat confusing. The MSDN rule of thumb seems to be ok but it depends on other factors, too. Test yourself...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
183     cycles per kByte for HeapAlloc       00004000h bytes (16 kB)
876     cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
40      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)

174     cycles per kByte for HeapAlloc       00008000h bytes (32 kB)
768     cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
20      cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)

1675    cycles per kByte for HeapAlloc       00010000h bytes (64 kB)
683     cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
211     cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)

192     cycles per kByte for HeapAlloc       00020000h bytes (128 kB)
648     cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
10      cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)

188     cycles per kByte for HeapAlloc       00040000h bytes (256 kB)
615     cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
7       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)

874     cycles per kByte for HeapAlloc       00080000h bytes (512 kB)
873     cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
881     cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)

1107    cycles per kByte for HeapAlloc       00100000h bytes (1 MB)
1107    cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
1107    cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: qWord on December 23, 2012, 09:35:56 AM
If allready using the CRT, _aligned_malloc (http://msdn.microsoft.com/en-us/library/8z34s9c6(v=vs.80).aspx) may be also an option.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 10:23:37 AM
Quote from: jj2007 on December 23, 2012, 09:34:52 AM

We have made up a testbed some time ago, and the results are somewhat confusing.
The MSDN rule of thumb seems to be ok but it depends on other factors, too. Test yourself...

Yea!
my testbed says also strange things:
Quote
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
277     cycles per kByte for HeapAlloc       00004000h bytes (16 kB)
1861    cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
92      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)

267     cycles per kByte for HeapAlloc       00008000h bytes (32 kB)
1555    cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
47      cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)

321     cycles per kByte for HeapAlloc       00010000h bytes (64 kB)
1429    cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
27      cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)

305     cycles per kByte for HeapAlloc       00020000h bytes (128 kB)
866     cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
11      cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)

300     cycles per kByte for HeapAlloc       00040000h bytes (256 kB)
1219    cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
9       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)

1207    cycles per kByte for HeapAlloc       00080000h bytes (512 kB)
1205    cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
817     cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)

820     cycles per kByte for HeapAlloc       00100000h bytes (1 MB)
823     cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
811     cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)

Memory was touched

Probably from 1 MB upwards HeapAlloc becomes faster.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 10:27:41 AM
Quote from: hutch-- on December 23, 2012, 09:21:34 AM
Frank,

The only safe way with allocated memory is to allocate more than
you need then align the front and set a pointer to it. Then you don't
have to guess or hope that it will have the alignment you require on
different OS versions.

Yes Hutch, I'm considering this advice as a guideline for cases in
which it is not possible to have the certainity of the alignment.

Quote from: qWord on December 23, 2012, 09:35:56 AM
If allready using the CRT, _aligned_malloc (http://msdn.microsoft.com/en-us/library/8z34s9c6(v=vs.80).aspx) may be also an option.

A small example could help me to see the possibilities of this.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: hutch-- on December 23, 2012, 10:29:31 AM
Some of the later memory allocation strategies were designed to ease the usage for people who don't know how to manage memory properly. A big single allocation chopped up to suit the app is always faster than many messy small allocations. The OS is pretty good at handling it but if you understand what you are doing, you do it better yourself.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 23, 2012, 11:23:40 AM
Hutch, in the future, if I'll know what I do much better,
I'll manage these stuff by myself.

For a 16 MB allocation HeapAlloc looks a little bit faster:
Quote
16.146  cycles for HeapAlloc
16.298  cycles for GlobalAlloc

16.190  cycles for HeapAlloc
16.741  cycles for GlobalAlloc

16.437  cycles for HeapAlloc
16.644  cycles for GlobalAlloc

16.062  cycles for HeapAlloc
16.732  cycles for GlobalAlloc


--- ok ---

So far I didn't find any concrete difference among the
various APIs used for allocating/freeing memory.
The Heap family seems to be reacher in options, but
for simple stuff they look almost the same.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: MichaelW on December 23, 2012, 02:14:30 PM
Quote from: frktons on December 23, 2012, 10:27:41 AM
A small example could help me to see the possibilities of this.

http://masm32.com/board/index.php?topic=508.msg3930#msg3930


Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Greenhorn on December 24, 2012, 04:39:47 AM
Quote from: frktons on December 23, 2012, 11:23:40 AM
So far I didn't find any concrete difference among the
various APIs used for allocating/freeing memory.
The Heap family seems to be reacher in options, but
for simple stuff they look almost the same.
Global/LocalAlloc is just present for backward compatibility to 16bit. They are just wrapper and call HeapAlloc.
Global and Local Functions (http://msdn.microsoft.com/en-us/library/aa366596%28v=vs.85%29.aspx)

For memory allocations up to 1MB HeapAlloc should be used. For allocations greater than 1MB VirtualAlloc should be used (which is called by HeapAlloc if size is more than 1MB).

Greenhorn
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: dedndave on December 24, 2012, 06:33:00 AM
i know the documents say that

then, i see certain operations that state that GlobalAlloc is required - lol
examples are reading from a resource or transfers to/from the clipboard (and some GDI+ stuff)
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 24, 2012, 08:14:21 AM
So far the use of HeapAlloc/Create/Free/Destroy... looks like the
solution that includes all the cases [less than 1MB / more than 1MB],
GlobalAlloc/LocalAlloc being just wrappers to call HeapAlloc, and
VirtualAlloc called by HeapAlloc when needed.

The test made by Michael [about alignment] says that , if you allocate
more than one buffer without freeing the previous ones, you can get
also a 8 byte alignment, not only 16 and its multiples.
::)
Quote
Win7, x64

290e80h
2937d0h
293fc0h

290e78h 8
2937c0h 64
293bb0h 16
293fa0h 32
294390h 16
294780h 128
294b70h 16
294f60h 32
295350h 16
295740h 64

295b40h 64
295f40h 64
296350h 16
296750h 16
296b60h 32
296f60h 32
297370h 16
297770h 16
297b80h 128
297f80h 128

2983c0h 64
2987c0h 64
298c00h 1024
299040h 64
299480h 128
2998c0h 64
299d00h 256
29a140h 64
29a580h 128
29a980h 128
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: hutch-- on December 24, 2012, 09:01:49 AM
 :biggrin:

Contrary to popular opinion, GlobalAlloc() with the GMEM_FIXED flag calls the same function in NTDLL.DLL as many of the other memory allocation strategies and it has always been fast and it can allocate any amount of memory within physical limits and OS address range. While thge other strategies do the job, GlobalAlloc() is fast, flexible and can handle the limits while being easier to use.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 24, 2012, 09:45:16 AM
Quote from: hutch-- on December 24, 2012, 09:01:49 AM
:biggrin:

Contrary to popular opinion, GlobalAlloc() with the GMEM_FIXED flag calls the same function in NTDLL.DLL as many of the other memory allocation strategies and it has always been fast and it can allocate any amount of memory within physical limits and OS address range. While thge other strategies do the job, GlobalAlloc() is fast, flexible and can handle the limits while being easier to use.
:t
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 29, 2012, 07:19:39 AM
Quote from: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.

Yes Tedd, that's what they say.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Tedd on December 29, 2012, 08:13:06 AM
Quote from: frktons on December 29, 2012, 07:19:39 AM
Quote from: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.

Yes Tedd, that's what they say.
Saying it doesn't make it true, but it's easily verified.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 29, 2012, 08:23:34 AM
Quote from: Tedd on December 29, 2012, 08:13:06 AM
Quote from: frktons on December 29, 2012, 07:19:39 AM
Quote from: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.

Yes Tedd, that's what they say.
Saying it doesn't make it true, but it's easily verified.
I mean it is what the official documentation says. And of course
it can be verified. :t
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Vortex on December 29, 2012, 07:53:58 PM
Hi Jochen,

Did you test malloc exported by msvcrt.dll? How is the result?
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: dedndave on December 30, 2012, 12:46:14 AM
wouldn't msvcrt require some initialization ?
so hard to tell - lol
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Vortex on December 30, 2012, 04:04:43 AM
Quote from: dedndave on December 30, 2012, 12:46:14 AM
wouldn't msvcrt require some initialization ?
so hard to tell - lol

Hi Dave,

No need for initialization :

include     \masm32\include\masm32rt.inc

BUFFER_SIZE = 64

.data

BuffSize    dd BUFFER_SIZE

.data?

pMem        dd ?

.code

start:

    invoke  crt_malloc,BUFFER_SIZE
    test    eax,eax
    jz      @f
    mov     pMem,eax

    invoke  GetComputerName,eax,ADDR BuffSize
    invoke  StdOut,pMem
    invoke  crt_free,pMem
@@:
    invoke  ExitProcess,0

END start


http://msdn.microsoft.com/en-us/library/aa246461(v=vs.60).aspx
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: jj2007 on December 30, 2012, 05:52:44 AM
Quote from: Vortex on December 29, 2012, 07:53:58 PM
Hi Jochen,

Did you test malloc exported by msvcrt.dll? How is the result?

Here it is. And the results are interesting :P

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
32   cycles per kByte for HeapAlloc plain 00004000h bytes (16 kB)
174   cycles per kByte for HeapAlloc ZINIT 00004000h bytes (16 kB)
868   cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
40   cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)
52   cycles per kByte for malloc          00004000h bytes (16 kB)

17   cycles per kByte for HeapAlloc plain 00008000h bytes (32 kB)
169   cycles per kByte for HeapAlloc ZINIT 00008000h bytes (32 kB)
752   cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
20   cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)
29   cycles per kByte for malloc          00008000h bytes (32 kB)

657   cycles per kByte for HeapAlloc plain 00010000h bytes (64 kB)
601   cycles per kByte for HeapAlloc ZINIT 00010000h bytes (64 kB)
685   cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
17   cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)
661   cycles per kByte for malloc          00010000h bytes (64 kB)

11   cycles per kByte for HeapAlloc plain 00020000h bytes (128 kB)
189   cycles per kByte for HeapAlloc ZINIT 00020000h bytes (128 kB)
622   cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
10   cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)
183   cycles per kByte for malloc          00020000h bytes (128 kB)

8   cycles per kByte for HeapAlloc plain 00040000h bytes (256 kB)
185   cycles per kByte for HeapAlloc ZINIT 00040000h bytes (256 kB)
645   cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
7   cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)
9   cycles per kByte for malloc          00040000h bytes (256 kB)

999   cycles per kByte for HeapAlloc plain 00080000h bytes (512 kB)
987   cycles per kByte for HeapAlloc ZINIT 00080000h bytes (512 kB)
1001   cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
1022   cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)
996   cycles per kByte for malloc          00080000h bytes (512 kB)

1165   cycles per kByte for HeapAlloc plain 00100000h bytes (1 MB)
1153   cycles per kByte for HeapAlloc ZINIT 00100000h bytes (1 MB)
1165   cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
1173   cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)
1154   cycles per kByte for malloc          00100000h bytes (1 MB)

1247   cycles per kByte for HeapAlloc plain 00200000h bytes (2 MB)
1249   cycles per kByte for HeapAlloc ZINIT 00200000h bytes (2 MB)
1252   cycles per kByte for VirtualAlloc    00200000h bytes (2 MB)
1236   cycles per kByte for GlobalAlloc     00200000h bytes (2 MB)
1238   cycles per kByte for malloc          00200000h bytes (2 MB)

590   cycles per kByte for HeapAlloc plain 00400000h bytes (4 MB)
591   cycles per kByte for HeapAlloc ZINIT 00400000h bytes (4 MB)
600   cycles per kByte for VirtualAlloc    00400000h bytes (4 MB)
592   cycles per kByte for GlobalAlloc     00400000h bytes (4 MB)
586   cycles per kByte for malloc          00400000h bytes (4 MB)
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Tedd on December 30, 2012, 06:50:05 AM
Quote from: jj2007 on December 30, 2012, 05:52:44 AM
Here it is. And the results are interesting :P
The results are likely misleading -- I expect you'd get different results if you change the order of the tests.

Each allocation affects the state of memory within the process, and the state of windows' memory manager.
This unfortunately means that the time taken for any particular allocation is dependent on what previous allocations have been made globally, and potentially by what method.

To get more representative results: randomize the order of all allocation tests (mix up sizes and methods) and repeat the tests multiple times, then take average allocation times.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: jj2007 on December 30, 2012, 06:58:45 AM
Quote from: Tedd on December 30, 2012, 06:50:05 AM
The results are likely misleading -- I expect you'd get different results if you change the order of the tests.

For you, Tedd. They look totally different because the order is inversed.
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
50      cycles per kByte for malloc          00004000h bytes (16 kB)
50      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)
889     cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
174     cycles per kByte for HeapAlloc ZINIT 00004000h bytes (16 kB)
26      cycles per kByte for HeapAlloc plain 00004000h bytes (16 kB)

31      cycles per kByte for malloc          00008000h bytes (32 kB)
25      cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)
770     cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
169     cycles per kByte for HeapAlloc ZINIT 00008000h bytes (32 kB)
13      cycles per kByte for HeapAlloc plain 00008000h bytes (32 kB)

686     cycles per kByte for malloc          00010000h bytes (64 kB)
682     cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)
686     cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
602     cycles per kByte for HeapAlloc ZINIT 00010000h bytes (64 kB)
14      cycles per kByte for HeapAlloc plain 00010000h bytes (64 kB)

189     cycles per kByte for malloc          00020000h bytes (128 kB)
13      cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)
628     cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
190     cycles per kByte for HeapAlloc ZINIT 00020000h bytes (128 kB)
9       cycles per kByte for HeapAlloc plain 00020000h bytes (128 kB)

9       cycles per kByte for malloc          00040000h bytes (256 kB)
9       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)
595     cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
187     cycles per kByte for HeapAlloc ZINIT 00040000h bytes (256 kB)
6       cycles per kByte for HeapAlloc plain 00040000h bytes (256 kB)

600     cycles per kByte for malloc          00080000h bytes (512 kB)
595     cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)
605     cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
605     cycles per kByte for HeapAlloc ZINIT 00080000h bytes (512 kB)
595     cycles per kByte for HeapAlloc plain 00080000h bytes (512 kB)

786     cycles per kByte for malloc          00100000h bytes (1 MB)
785     cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)
790     cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
811     cycles per kByte for HeapAlloc ZINIT 00100000h bytes (1 MB)
800     cycles per kByte for HeapAlloc plain 00100000h bytes (1 MB)

1054    cycles per kByte for malloc          00200000h bytes (2 MB)
1054    cycles per kByte for GlobalAlloc     00200000h bytes (2 MB)
1091    cycles per kByte for VirtualAlloc    00200000h bytes (2 MB)
1066    cycles per kByte for HeapAlloc ZINIT 00200000h bytes (2 MB)
1057    cycles per kByte for HeapAlloc plain 00200000h bytes (2 MB)

612     cycles per kByte for malloc          00400000h bytes (4 MB)
609     cycles per kByte for GlobalAlloc     00400000h bytes (4 MB)
613     cycles per kByte for VirtualAlloc    00400000h bytes (4 MB)
594     cycles per kByte for HeapAlloc ZINIT 00400000h bytes (4 MB)
614     cycles per kByte for HeapAlloc plain 00400000h bytes (4 MB)
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Tedd on December 30, 2012, 08:00:11 AM
Yup...


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Run A
        HAlloc  HAllocZ VAlloc  GAlloc  malloc
16k       32     174     868      40      52
32k       17     169     752      20      29
64k      657     601     685      17     661
128k      11     189     622      10     183
256k       8     185     645       7       9
512k     999     987    1001    1022     996
1M      1165    1153    1165    1173    1154
2M      1247    1249    1252    1236    1238
4M       590     591     600     592     586

Run B
        HAlloc  HAllocZ VAlloc  GAlloc  malloc
16k       26     174     889      50      50
32k       13     169     770      25      31
64k       14     602     686     682     686
128k       9     190     628      13     189
256k       6     187     595       9       9
512k     595     605     605     595     600
1M       800     811     790     785     786
2M      1057    1066    1091    1054    1054
4M       614     594     613     609     612


Abs. Diff.
        HAlloc  HAllocZ VAlloc  GAlloc  malloc
16k      6       0      21      10       2
32k      4       0      18       5       2
64k    643     1      1    665     25
128k      2       1       6       3       6
256k      2       2      50       2       0
512k   404    382   396    427   396
1M    365    342   375    388    368
2M    190    183   161    182    184
4M     24       3      13      17      26



Quote from: Tedd on December 30, 2012, 06:50:05 AM
To get more representative results: randomize the order of all allocation tests (mix up sizes and methods) and repeat the tests multiple times, then take average allocation times.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: jj2007 on December 30, 2012, 08:09:54 AM
Tedd, you are perfectly right. I have no time to do it, but I think everybody will appreciate if you realise that idea.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: frktons on December 30, 2012, 10:09:28 AM
Quote from: jj2007 on December 30, 2012, 08:09:54 AM
Tedd, you are perfectly right. I have no time to do it, but I think everybody will appreciate if you realise that idea.
Agreed. If Tedd posts these results it will be quite useful indeed.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: hutch-- on December 30, 2012, 12:52:54 PM
I think from memory that anything under 64k is a waste of time due to page size in any memory allocation strategy. The other factor of course is that once you have memory allocated, they all perform the same, memory is memory. Apart from the normal strategies like Heap, Global and Virtual allocs, OLE string memory was useful as it was more closely connected to the OS and appeared to come from a different address range. It was a bit slower in allocation but performed fine once allocated.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: sinsi on December 30, 2012, 01:06:13 PM
Don't forget things like page tables, TLB and the like.
Windows also has an idle thread that zeroes free pages, so ZERO_INIT times will be everywhere too.
Title: Re: GlobalAlloc, GlobalFree, Malloc and friends
Post by: Don57 on January 24, 2013, 07:31:06 AM
Been a few years since I've use GlobalAlloc so checked the MS data sheets, and I guess MS considers them obsolete.

The global functions have greater overhead and provide fewer features than other memory management functions. New applications should use the heap functions unless documentation states that a global function should be used. For more information, see Global and Local Functions.