Hi friends. Happy Holydays.
I'm trying to understand if it is possible, when allocating memory, to
have it aligned by 4-8-16. I didn't find any exaustive info so far.
And the second point I still don't have clear in my mind is when is it
important to deallocate/free the previously allocated memory?
Thanks for your help.
Frank
it's already aligned
i think by 16
if it were not...
you could allocate 15 extra bytes
store the returned address to be used by the Free function
then ADD 15 to it, then AND it with 0FFFFFFF0h and use that as an aligned access address
Quote from: dedndave on December 23, 2012, 05:53:18 AM
it's already aligned
i think by 16
if it were not...
you could allocate 15 extra bytes
store the returned address to be used by the Free function
then ADD 15 to it, then AND it with 0FFFFFFF0h and use that as an aligned access address
According to MSDN:
Quote
http://msdn.microsoft.com/en-us/library/aa366574(v=vs.85).aspx
says, inter alia
Memory allocated with this function is guaranteed to be aligned on an 8-byte boundary.
To execute dynamically generated code, use the VirtualAlloc function to allocate memory
and the VirtualProtect function to grant PAGE_EXECUTE access.
It seems the standard allocation is 8 aligned. And for 16 alignment
it is probably needed to use something like the one you suggested.
I'll make some tests with GlobalAlloc/HeapAlloc and see what alignment
I get.
GlobalAlloc is aligned on 8 byte boundaries.
// Sorry frktons, I was typing while you answered....
Quote from: Donkey on December 23, 2012, 06:03:15 AM
GlobalAlloc is aligned on 8 byte boundaries.
// Sorry frktons, I was typing while you answered....
Thanks. No problem Edgar.
I recall that in an old program, we used GlobalAlloc to allocate
16 MB and after we accessed it with SSE2 instructions and XMM registers
that require 16 bytes alignment.
And it worked. I'm wondering why, then, it worked?
Shouldn't it throw a General Protection Fault?
invoke GlobalAlloc,GMEM_ZEROINIT or GMEM_FIXED,16*1024*1024
mov DataPtr, eax
.....
mov edx, DataPtr
lea ecx, [edx+16000000]
mov eax, 20202020h
movd xmm0, eax
pshufd xmm0, xmm0, 0
movdqa xmm1, xmm0
movdqa xmm2, xmm0
movdqa xmm3, xmm0
movdqa xmm4, xmm0
@@:
movdqa [edx], xmm0
movdqa [edx + 16], xmm1
movdqa [edx + 32], xmm2
movdqa [edx + 48], xmm3
movdqa [edx + 64], xmm4
add edx, 80
cmp edx, ecx
jl @B
OK. The test shows that the GlobalAlloc() API uses a 16 bytes
boundaries to allocate memory, at least on my system and
with a size of 16 MB.
;-----------------------------------------------------
; TestAlloc.asm
;-----------------------------------------------------
; Test 10 times the GlobalAlloc API and displays the
; alignment of allocated memory blocks
; frktons - 22-dec-2012
;-----------------------------------------------------
.nolist
include \masm32\include\masm32rt.inc
.686
.DATA
DataPtr dd 0
.CODE
start:
mov ecx, 10
alloc_again:
push ecx
invoke GlobalAlloc,GMEM_ZEROINIT or GMEM_FIXED,16*1024*1024
mov DataPtr, eax
and eax, 15
.IF ( eax == 0)
print " The allocated memory is 16 bytes aligned", 13, 10
jmp next_alloc
.ENDIF
mov eax, DataPtr
and eax, 7
.IF ( eax == 0)
print " The allocated memory is 8 bytes aligned", 13, 10
.ENDIF
next_alloc:
invoke GlobalFree, DataPtr
pop ecx
dec ecx
jnz alloc_again
end_test:
inkey chr$(13, 10, "--- ok ---", 13)
exit
end start
Quote
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
--- ok ---
Maybe if the memory to allocate is not a multiple of 16, it
uses the default 8 bytes alignment.
Another test is needed, I think.
If you test it on your system, would you confirm this data?
Thanks
Hi frktons,
The docs only guarantee 8 byte alignment, that doesn't necessarily mean it will be 8 bytes, just that you cannot assume it will be greater than that. You're best to stay away from GlobalAlloc for anything that requires alignment anyway, VirtualAlloc is a faster and more flexible function.
Quote from: Donkey on December 23, 2012, 06:43:48 AM
Hi frktons,
The docs only guarantee 8 byte alignment, that doesn't necessarily mean it will be 8 bytes, just that you cannot assume it will be greater than that. You're best to stay away from GlobalAlloc for anything that requires alignment anyway, VirtualAlloc is a faster and more flexible function.
Yes Edgar, that's probably a wise advice. Anyway if you make some tests
with Your favorite API please give me a shot.
I used a 1024*1024+15 bytes quantity to allocate, but I add the same result.
This make me suspect there is something else that the docs don't say.
VirtualAlloc guarantees a 16-byte (actually: 4096-byte) alignment, but it is much slower than HeapAlloc for small allocations. Use HeapAlloc for many small non-SSE2 allocs and VirtualAlloc for the rest.
GlobalAlloc follows basically the same strategy, that's why you most probably get 16-byte alignment for fat chunks. But it is not documented...
Quote from: jj2007 on December 23, 2012, 07:37:04 AM
VirtualAlloc guarantees a 16-byte (actually: 4096-byte) alignment, but it is much
slower than HeapAlloc for small allocations. Use HeapAlloc for many small
non-SSE2 allocs and VirtualAlloc for the rest.
GlobalAlloc follows basically the same strategy, that's why you most probably get
16-byte alignment for fat chunks. But it is not documented...
Thanks Jochen.
How many KB is a small allocation?
I've read in MSDN that HeapAlloc is preferable to GlobalAlloc:
Quote
Note The global functions have greater overhead and provide fewer
features than other memory management functions. New applications
should use the heap functions unless documentation states that a
global function should be used. For more information, see Global and
Local Functions.
Frank,
The only safe way with allocated memory is to allocate more than you need then align the front and set a pointer to it. Then you don't have to guess or hope that it will have the alignment you require on different OS versions.
Quote from: frktons on December 23, 2012, 09:10:40 AM
How many KB is a small allocation?
I've read in MSDN that HeapAlloc is preferable to GlobalAlloc
We have made up a testbed some time ago, and the results are somewhat confusing. The MSDN rule of thumb seems to be ok but it depends on other factors, too. Test yourself...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
183 cycles per kByte for HeapAlloc 00004000h bytes (16 kB)
876 cycles per kByte for VirtualAlloc 00004000h bytes (16 kB)
40 cycles per kByte for GlobalAlloc 00004000h bytes (16 kB)
174 cycles per kByte for HeapAlloc 00008000h bytes (32 kB)
768 cycles per kByte for VirtualAlloc 00008000h bytes (32 kB)
20 cycles per kByte for GlobalAlloc 00008000h bytes (32 kB)
1675 cycles per kByte for HeapAlloc 00010000h bytes (64 kB)
683 cycles per kByte for VirtualAlloc 00010000h bytes (64 kB)
211 cycles per kByte for GlobalAlloc 00010000h bytes (64 kB)
192 cycles per kByte for HeapAlloc 00020000h bytes (128 kB)
648 cycles per kByte for VirtualAlloc 00020000h bytes (128 kB)
10 cycles per kByte for GlobalAlloc 00020000h bytes (128 kB)
188 cycles per kByte for HeapAlloc 00040000h bytes (256 kB)
615 cycles per kByte for VirtualAlloc 00040000h bytes (256 kB)
7 cycles per kByte for GlobalAlloc 00040000h bytes (256 kB)
874 cycles per kByte for HeapAlloc 00080000h bytes (512 kB)
873 cycles per kByte for VirtualAlloc 00080000h bytes (512 kB)
881 cycles per kByte for GlobalAlloc 00080000h bytes (512 kB)
1107 cycles per kByte for HeapAlloc 00100000h bytes (1 MB)
1107 cycles per kByte for VirtualAlloc 00100000h bytes (1 MB)
1107 cycles per kByte for GlobalAlloc 00100000h bytes (1 MB)
If allready using the CRT, _aligned_malloc (http://msdn.microsoft.com/en-us/library/8z34s9c6(v=vs.80).aspx) may be also an option.
Quote from: jj2007 on December 23, 2012, 09:34:52 AM
We have made up a testbed some time ago, and the results are somewhat confusing.
The MSDN rule of thumb seems to be ok but it depends on other factors, too. Test yourself...
Yea!
my testbed says also strange things:
Quote
Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz (SSE4)
277 cycles per kByte for HeapAlloc 00004000h bytes (16 kB)
1861 cycles per kByte for VirtualAlloc 00004000h bytes (16 kB)
92 cycles per kByte for GlobalAlloc 00004000h bytes (16 kB)
267 cycles per kByte for HeapAlloc 00008000h bytes (32 kB)
1555 cycles per kByte for VirtualAlloc 00008000h bytes (32 kB)
47 cycles per kByte for GlobalAlloc 00008000h bytes (32 kB)
321 cycles per kByte for HeapAlloc 00010000h bytes (64 kB)
1429 cycles per kByte for VirtualAlloc 00010000h bytes (64 kB)
27 cycles per kByte for GlobalAlloc 00010000h bytes (64 kB)
305 cycles per kByte for HeapAlloc 00020000h bytes (128 kB)
866 cycles per kByte for VirtualAlloc 00020000h bytes (128 kB)
11 cycles per kByte for GlobalAlloc 00020000h bytes (128 kB)
300 cycles per kByte for HeapAlloc 00040000h bytes (256 kB)
1219 cycles per kByte for VirtualAlloc 00040000h bytes (256 kB)
9 cycles per kByte for GlobalAlloc 00040000h bytes (256 kB)
1207 cycles per kByte for HeapAlloc 00080000h bytes (512 kB)
1205 cycles per kByte for VirtualAlloc 00080000h bytes (512 kB)
817 cycles per kByte for GlobalAlloc 00080000h bytes (512 kB)
820 cycles per kByte for HeapAlloc 00100000h bytes (1 MB)
823 cycles per kByte for VirtualAlloc 00100000h bytes (1 MB)
811 cycles per kByte for GlobalAlloc 00100000h bytes (1 MB)
Memory was touched
Probably from 1 MB upwards HeapAlloc becomes faster.
Quote from: hutch-- on December 23, 2012, 09:21:34 AM
Frank,
The only safe way with allocated memory is to allocate more than
you need then align the front and set a pointer to it. Then you don't
have to guess or hope that it will have the alignment you require on
different OS versions.
Yes Hutch, I'm considering this advice as a guideline for cases in
which it is not possible to have the certainity of the alignment.
Quote from: qWord on December 23, 2012, 09:35:56 AM
If allready using the CRT, _aligned_malloc (http://msdn.microsoft.com/en-us/library/8z34s9c6(v=vs.80).aspx) may be also an option.
A small example could help me to see the possibilities of this.
Some of the later memory allocation strategies were designed to ease the usage for people who don't know how to manage memory properly. A big single allocation chopped up to suit the app is always faster than many messy small allocations. The OS is pretty good at handling it but if you understand what you are doing, you do it better yourself.
Hutch, in the future, if I'll know what I do much better,
I'll manage these stuff by myself.
For a 16 MB allocation HeapAlloc looks a little bit faster:
Quote
16.146 cycles for HeapAlloc
16.298 cycles for GlobalAlloc
16.190 cycles for HeapAlloc
16.741 cycles for GlobalAlloc
16.437 cycles for HeapAlloc
16.644 cycles for GlobalAlloc
16.062 cycles for HeapAlloc
16.732 cycles for GlobalAlloc
--- ok ---
So far I didn't find any concrete difference among the
various APIs used for allocating/freeing memory.
The Heap family seems to be reacher in options, but
for simple stuff they look almost the same.
Quote from: frktons on December 23, 2012, 10:27:41 AM
A small example could help me to see the possibilities of this.
http://masm32.com/board/index.php?topic=508.msg3930#msg3930
Quote from: frktons on December 23, 2012, 11:23:40 AM
So far I didn't find any concrete difference among the
various APIs used for allocating/freeing memory.
The Heap family seems to be reacher in options, but
for simple stuff they look almost the same.
Global/LocalAlloc is just present for backward compatibility to 16bit. They are just wrapper and call HeapAlloc.
Global and Local Functions (http://msdn.microsoft.com/en-us/library/aa366596%28v=vs.85%29.aspx)
For memory allocations up to 1MB HeapAlloc should be used. For allocations greater than 1MB VirtualAlloc should be used (which is called by HeapAlloc if size is more than 1MB).
Greenhorn
i know the documents say that
then, i see certain operations that state that GlobalAlloc is required - lol
examples are reading from a resource or transfers to/from the clipboard (and some GDI+ stuff)
So far the use of HeapAlloc/Create/Free/Destroy... looks like the
solution that includes all the cases [less than 1MB / more than 1MB],
GlobalAlloc/LocalAlloc being just wrappers to call HeapAlloc, and
VirtualAlloc called by HeapAlloc when needed.
The test made by Michael [about alignment] says that , if you allocate
more than one buffer without freeing the previous ones, you can get
also a 8 byte alignment, not only 16 and its multiples.
::)
Quote
Win7, x64
290e80h
2937d0h
293fc0h
290e78h 8
2937c0h 64
293bb0h 16
293fa0h 32
294390h 16
294780h 128
294b70h 16
294f60h 32
295350h 16
295740h 64
295b40h 64
295f40h 64
296350h 16
296750h 16
296b60h 32
296f60h 32
297370h 16
297770h 16
297b80h 128
297f80h 128
2983c0h 64
2987c0h 64
298c00h 1024
299040h 64
299480h 128
2998c0h 64
299d00h 256
29a140h 64
29a580h 128
29a980h 128
:biggrin:
Contrary to popular opinion, GlobalAlloc() with the GMEM_FIXED flag calls the same function in NTDLL.DLL as many of the other memory allocation strategies and it has always been fast and it can allocate any amount of memory within physical limits and OS address range. While thge other strategies do the job, GlobalAlloc() is fast, flexible and can handle the limits while being easier to use.
Quote from: hutch-- on December 24, 2012, 09:01:49 AM
:biggrin:
Contrary to popular opinion, GlobalAlloc() with the GMEM_FIXED flag calls the same function in NTDLL.DLL as many of the other memory allocation strategies and it has always been fast and it can allocate any amount of memory within physical limits and OS address range. While thge other strategies do the job, GlobalAlloc() is fast, flexible and can handle the limits while being easier to use.
:t
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.
Quote from: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.
Yes Tedd, that's what they say.
Quote from: frktons on December 29, 2012, 07:19:39 AM
Quote from: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.
Yes Tedd, that's what they say.
Saying it doesn't make it true, but it's easily verified.
Quote from: Tedd on December 29, 2012, 08:13:06 AM
Quote from: frktons on December 29, 2012, 07:19:39 AM
Quote from: Tedd on December 29, 2012, 05:13:44 AM
GlobalAlloc does call RltAllocateHeap, eventually... (after fiddling with the parameters and setting up.)
HeapAlloc is routed directly to RltAllocateHeap.
Yes Tedd, that's what they say.
Saying it doesn't make it true, but it's easily verified.
I mean it is what the official documentation says. And of course
it can be verified. :t
Hi Jochen,
Did you test malloc exported by msvcrt.dll? How is the result?
wouldn't msvcrt require some initialization ?
so hard to tell - lol
Quote from: dedndave on December 30, 2012, 12:46:14 AM
wouldn't msvcrt require some initialization ?
so hard to tell - lol
Hi Dave,
No need for initialization :
include \masm32\include\masm32rt.inc
BUFFER_SIZE = 64
.data
BuffSize dd BUFFER_SIZE
.data?
pMem dd ?
.code
start:
invoke crt_malloc,BUFFER_SIZE
test eax,eax
jz @f
mov pMem,eax
invoke GetComputerName,eax,ADDR BuffSize
invoke StdOut,pMem
invoke crt_free,pMem
@@:
invoke ExitProcess,0
END start
http://msdn.microsoft.com/en-us/library/aa246461(v=vs.60).aspx
Quote from: Vortex on December 29, 2012, 07:53:58 PM
Hi Jochen,
Did you test malloc exported by msvcrt.dll? How is the result?
Here it is. And the results are interesting :P
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
32 cycles per kByte for HeapAlloc plain 00004000h bytes (16 kB)
174 cycles per kByte for HeapAlloc ZINIT 00004000h bytes (16 kB)
868 cycles per kByte for VirtualAlloc 00004000h bytes (16 kB)
40 cycles per kByte for GlobalAlloc 00004000h bytes (16 kB)
52 cycles per kByte for malloc 00004000h bytes (16 kB)
17 cycles per kByte for HeapAlloc plain 00008000h bytes (32 kB)
169 cycles per kByte for HeapAlloc ZINIT 00008000h bytes (32 kB)
752 cycles per kByte for VirtualAlloc 00008000h bytes (32 kB)
20 cycles per kByte for GlobalAlloc 00008000h bytes (32 kB)
29 cycles per kByte for malloc 00008000h bytes (32 kB)
657 cycles per kByte for HeapAlloc plain 00010000h bytes (64 kB)
601 cycles per kByte for HeapAlloc ZINIT 00010000h bytes (64 kB)
685 cycles per kByte for VirtualAlloc 00010000h bytes (64 kB)
17 cycles per kByte for GlobalAlloc 00010000h bytes (64 kB)
661 cycles per kByte for malloc 00010000h bytes (64 kB)
11 cycles per kByte for HeapAlloc plain 00020000h bytes (128 kB)
189 cycles per kByte for HeapAlloc ZINIT 00020000h bytes (128 kB)
622 cycles per kByte for VirtualAlloc 00020000h bytes (128 kB)
10 cycles per kByte for GlobalAlloc 00020000h bytes (128 kB)
183 cycles per kByte for malloc 00020000h bytes (128 kB)
8 cycles per kByte for HeapAlloc plain 00040000h bytes (256 kB)
185 cycles per kByte for HeapAlloc ZINIT 00040000h bytes (256 kB)
645 cycles per kByte for VirtualAlloc 00040000h bytes (256 kB)
7 cycles per kByte for GlobalAlloc 00040000h bytes (256 kB)
9 cycles per kByte for malloc 00040000h bytes (256 kB)
999 cycles per kByte for HeapAlloc plain 00080000h bytes (512 kB)
987 cycles per kByte for HeapAlloc ZINIT 00080000h bytes (512 kB)
1001 cycles per kByte for VirtualAlloc 00080000h bytes (512 kB)
1022 cycles per kByte for GlobalAlloc 00080000h bytes (512 kB)
996 cycles per kByte for malloc 00080000h bytes (512 kB)
1165 cycles per kByte for HeapAlloc plain 00100000h bytes (1 MB)
1153 cycles per kByte for HeapAlloc ZINIT 00100000h bytes (1 MB)
1165 cycles per kByte for VirtualAlloc 00100000h bytes (1 MB)
1173 cycles per kByte for GlobalAlloc 00100000h bytes (1 MB)
1154 cycles per kByte for malloc 00100000h bytes (1 MB)
1247 cycles per kByte for HeapAlloc plain 00200000h bytes (2 MB)
1249 cycles per kByte for HeapAlloc ZINIT 00200000h bytes (2 MB)
1252 cycles per kByte for VirtualAlloc 00200000h bytes (2 MB)
1236 cycles per kByte for GlobalAlloc 00200000h bytes (2 MB)
1238 cycles per kByte for malloc 00200000h bytes (2 MB)
590 cycles per kByte for HeapAlloc plain 00400000h bytes (4 MB)
591 cycles per kByte for HeapAlloc ZINIT 00400000h bytes (4 MB)
600 cycles per kByte for VirtualAlloc 00400000h bytes (4 MB)
592 cycles per kByte for GlobalAlloc 00400000h bytes (4 MB)
586 cycles per kByte for malloc 00400000h bytes (4 MB)
Quote from: jj2007 on December 30, 2012, 05:52:44 AM
Here it is. And the results are interesting :P
The results are likely misleading -- I expect you'd get different results if you change the order of the tests.
Each allocation affects the state of memory within the process, and the state of windows' memory manager.
This unfortunately means that the time taken for any particular allocation is dependent on what previous allocations have been made globally, and potentially by what method.
To get more representative results: randomize the order of all allocation tests (mix up sizes and methods) and repeat the tests multiple times, then take average allocation times.
Quote from: Tedd on December 30, 2012, 06:50:05 AM
The results are likely misleading -- I expect you'd get different results if you change the order of the tests.
For you, Tedd. They look totally different because the order is inversed.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
50 cycles per kByte for malloc 00004000h bytes (16 kB)
50 cycles per kByte for GlobalAlloc 00004000h bytes (16 kB)
889 cycles per kByte for VirtualAlloc 00004000h bytes (16 kB)
174 cycles per kByte for HeapAlloc ZINIT 00004000h bytes (16 kB)
26 cycles per kByte for HeapAlloc plain 00004000h bytes (16 kB)
31 cycles per kByte for malloc 00008000h bytes (32 kB)
25 cycles per kByte for GlobalAlloc 00008000h bytes (32 kB)
770 cycles per kByte for VirtualAlloc 00008000h bytes (32 kB)
169 cycles per kByte for HeapAlloc ZINIT 00008000h bytes (32 kB)
13 cycles per kByte for HeapAlloc plain 00008000h bytes (32 kB)
686 cycles per kByte for malloc 00010000h bytes (64 kB)
682 cycles per kByte for GlobalAlloc 00010000h bytes (64 kB)
686 cycles per kByte for VirtualAlloc 00010000h bytes (64 kB)
602 cycles per kByte for HeapAlloc ZINIT 00010000h bytes (64 kB)
14 cycles per kByte for HeapAlloc plain 00010000h bytes (64 kB)
189 cycles per kByte for malloc 00020000h bytes (128 kB)
13 cycles per kByte for GlobalAlloc 00020000h bytes (128 kB)
628 cycles per kByte for VirtualAlloc 00020000h bytes (128 kB)
190 cycles per kByte for HeapAlloc ZINIT 00020000h bytes (128 kB)
9 cycles per kByte for HeapAlloc plain 00020000h bytes (128 kB)
9 cycles per kByte for malloc 00040000h bytes (256 kB)
9 cycles per kByte for GlobalAlloc 00040000h bytes (256 kB)
595 cycles per kByte for VirtualAlloc 00040000h bytes (256 kB)
187 cycles per kByte for HeapAlloc ZINIT 00040000h bytes (256 kB)
6 cycles per kByte for HeapAlloc plain 00040000h bytes (256 kB)
600 cycles per kByte for malloc 00080000h bytes (512 kB)
595 cycles per kByte for GlobalAlloc 00080000h bytes (512 kB)
605 cycles per kByte for VirtualAlloc 00080000h bytes (512 kB)
605 cycles per kByte for HeapAlloc ZINIT 00080000h bytes (512 kB)
595 cycles per kByte for HeapAlloc plain 00080000h bytes (512 kB)
786 cycles per kByte for malloc 00100000h bytes (1 MB)
785 cycles per kByte for GlobalAlloc 00100000h bytes (1 MB)
790 cycles per kByte for VirtualAlloc 00100000h bytes (1 MB)
811 cycles per kByte for HeapAlloc ZINIT 00100000h bytes (1 MB)
800 cycles per kByte for HeapAlloc plain 00100000h bytes (1 MB)
1054 cycles per kByte for malloc 00200000h bytes (2 MB)
1054 cycles per kByte for GlobalAlloc 00200000h bytes (2 MB)
1091 cycles per kByte for VirtualAlloc 00200000h bytes (2 MB)
1066 cycles per kByte for HeapAlloc ZINIT 00200000h bytes (2 MB)
1057 cycles per kByte for HeapAlloc plain 00200000h bytes (2 MB)
612 cycles per kByte for malloc 00400000h bytes (4 MB)
609 cycles per kByte for GlobalAlloc 00400000h bytes (4 MB)
613 cycles per kByte for VirtualAlloc 00400000h bytes (4 MB)
594 cycles per kByte for HeapAlloc ZINIT 00400000h bytes (4 MB)
614 cycles per kByte for HeapAlloc plain 00400000h bytes (4 MB)
Yup...
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
Run A
HAlloc HAllocZ VAlloc GAlloc malloc
16k 32 174 868 40 52
32k 17 169 752 20 29
64k 657 601 685 17 661
128k 11 189 622 10 183
256k 8 185 645 7 9
512k 999 987 1001 1022 996
1M 1165 1153 1165 1173 1154
2M 1247 1249 1252 1236 1238
4M 590 591 600 592 586
Run B
HAlloc HAllocZ VAlloc GAlloc malloc
16k 26 174 889 50 50
32k 13 169 770 25 31
64k 14 602 686 682 686
128k 9 190 628 13 189
256k 6 187 595 9 9
512k 595 605 605 595 600
1M 800 811 790 785 786
2M 1057 1066 1091 1054 1054
4M 614 594 613 609 612
Abs. Diff.
HAlloc HAllocZ VAlloc GAlloc malloc
16k 6 0 21 10 2
32k 4 0 18 5 2
64k 643 1 1 665 25
128k 2 1 6 3 6
256k 2 2 50 2 0
512k 404 382 396 427 396
1M 365 342 375 388 368
2M 190 183 161 182 184
4M 24 3 13 17 26
Quote from: Tedd on December 30, 2012, 06:50:05 AM
To get more representative results: randomize the order of all allocation tests (mix up sizes and methods) and repeat the tests multiple times, then take average allocation times.
Tedd, you are perfectly right. I have no time to do it, but I think everybody will appreciate if you realise that idea.
Quote from: jj2007 on December 30, 2012, 08:09:54 AM
Tedd, you are perfectly right. I have no time to do it, but I think everybody will appreciate if you realise that idea.
Agreed. If Tedd posts these results it will be quite useful indeed.
I think from memory that anything under 64k is a waste of time due to page size in any memory allocation strategy. The other factor of course is that once you have memory allocated, they all perform the same, memory is memory. Apart from the normal strategies like Heap, Global and Virtual allocs, OLE string memory was useful as it was more closely connected to the OS and appeared to come from a different address range. It was a bit slower in allocation but performed fine once allocated.
Don't forget things like page tables, TLB and the like.
Windows also has an idle thread that zeroes free pages, so ZERO_INIT times will be everywhere too.
Been a few years since I've use GlobalAlloc so checked the MS data sheets, and I guess MS considers them obsolete.
The global functions have greater overhead and provide fewer features than other memory management functions. New applications should use the heap functions unless documentation states that a global function should be used. For more information, see Global and Local Functions.