News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

GlobalAlloc, GlobalFree, Malloc and friends

Started by frktons, December 23, 2012, 05:23:05 AM

Previous topic - Next topic

jj2007

Quote from: Vortex on December 29, 2012, 07:53:58 PM
Hi Jochen,

Did you test malloc exported by msvcrt.dll? How is the result?

Here it is. And the results are interesting :P

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
32   cycles per kByte for HeapAlloc plain 00004000h bytes (16 kB)
174   cycles per kByte for HeapAlloc ZINIT 00004000h bytes (16 kB)
868   cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
40   cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)
52   cycles per kByte for malloc          00004000h bytes (16 kB)

17   cycles per kByte for HeapAlloc plain 00008000h bytes (32 kB)
169   cycles per kByte for HeapAlloc ZINIT 00008000h bytes (32 kB)
752   cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
20   cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)
29   cycles per kByte for malloc          00008000h bytes (32 kB)

657   cycles per kByte for HeapAlloc plain 00010000h bytes (64 kB)
601   cycles per kByte for HeapAlloc ZINIT 00010000h bytes (64 kB)
685   cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
17   cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)
661   cycles per kByte for malloc          00010000h bytes (64 kB)

11   cycles per kByte for HeapAlloc plain 00020000h bytes (128 kB)
189   cycles per kByte for HeapAlloc ZINIT 00020000h bytes (128 kB)
622   cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
10   cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)
183   cycles per kByte for malloc          00020000h bytes (128 kB)

8   cycles per kByte for HeapAlloc plain 00040000h bytes (256 kB)
185   cycles per kByte for HeapAlloc ZINIT 00040000h bytes (256 kB)
645   cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
7   cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)
9   cycles per kByte for malloc          00040000h bytes (256 kB)

999   cycles per kByte for HeapAlloc plain 00080000h bytes (512 kB)
987   cycles per kByte for HeapAlloc ZINIT 00080000h bytes (512 kB)
1001   cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
1022   cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)
996   cycles per kByte for malloc          00080000h bytes (512 kB)

1165   cycles per kByte for HeapAlloc plain 00100000h bytes (1 MB)
1153   cycles per kByte for HeapAlloc ZINIT 00100000h bytes (1 MB)
1165   cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
1173   cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)
1154   cycles per kByte for malloc          00100000h bytes (1 MB)

1247   cycles per kByte for HeapAlloc plain 00200000h bytes (2 MB)
1249   cycles per kByte for HeapAlloc ZINIT 00200000h bytes (2 MB)
1252   cycles per kByte for VirtualAlloc    00200000h bytes (2 MB)
1236   cycles per kByte for GlobalAlloc     00200000h bytes (2 MB)
1238   cycles per kByte for malloc          00200000h bytes (2 MB)

590   cycles per kByte for HeapAlloc plain 00400000h bytes (4 MB)
591   cycles per kByte for HeapAlloc ZINIT 00400000h bytes (4 MB)
600   cycles per kByte for VirtualAlloc    00400000h bytes (4 MB)
592   cycles per kByte for GlobalAlloc     00400000h bytes (4 MB)
586   cycles per kByte for malloc          00400000h bytes (4 MB)

Tedd

Quote from: jj2007 on December 30, 2012, 05:52:44 AM
Here it is. And the results are interesting :P
The results are likely misleading -- I expect you'd get different results if you change the order of the tests.

Each allocation affects the state of memory within the process, and the state of windows' memory manager.
This unfortunately means that the time taken for any particular allocation is dependent on what previous allocations have been made globally, and potentially by what method.

To get more representative results: randomize the order of all allocation tests (mix up sizes and methods) and repeat the tests multiple times, then take average allocation times.
Potato2

jj2007

Quote from: Tedd on December 30, 2012, 06:50:05 AM
The results are likely misleading -- I expect you'd get different results if you change the order of the tests.

For you, Tedd. They look totally different because the order is inversed.
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
50      cycles per kByte for malloc          00004000h bytes (16 kB)
50      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)
889     cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
174     cycles per kByte for HeapAlloc ZINIT 00004000h bytes (16 kB)
26      cycles per kByte for HeapAlloc plain 00004000h bytes (16 kB)

31      cycles per kByte for malloc          00008000h bytes (32 kB)
25      cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)
770     cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
169     cycles per kByte for HeapAlloc ZINIT 00008000h bytes (32 kB)
13      cycles per kByte for HeapAlloc plain 00008000h bytes (32 kB)

686     cycles per kByte for malloc          00010000h bytes (64 kB)
682     cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)
686     cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
602     cycles per kByte for HeapAlloc ZINIT 00010000h bytes (64 kB)
14      cycles per kByte for HeapAlloc plain 00010000h bytes (64 kB)

189     cycles per kByte for malloc          00020000h bytes (128 kB)
13      cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)
628     cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
190     cycles per kByte for HeapAlloc ZINIT 00020000h bytes (128 kB)
9       cycles per kByte for HeapAlloc plain 00020000h bytes (128 kB)

9       cycles per kByte for malloc          00040000h bytes (256 kB)
9       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)
595     cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
187     cycles per kByte for HeapAlloc ZINIT 00040000h bytes (256 kB)
6       cycles per kByte for HeapAlloc plain 00040000h bytes (256 kB)

600     cycles per kByte for malloc          00080000h bytes (512 kB)
595     cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)
605     cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
605     cycles per kByte for HeapAlloc ZINIT 00080000h bytes (512 kB)
595     cycles per kByte for HeapAlloc plain 00080000h bytes (512 kB)

786     cycles per kByte for malloc          00100000h bytes (1 MB)
785     cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)
790     cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
811     cycles per kByte for HeapAlloc ZINIT 00100000h bytes (1 MB)
800     cycles per kByte for HeapAlloc plain 00100000h bytes (1 MB)

1054    cycles per kByte for malloc          00200000h bytes (2 MB)
1054    cycles per kByte for GlobalAlloc     00200000h bytes (2 MB)
1091    cycles per kByte for VirtualAlloc    00200000h bytes (2 MB)
1066    cycles per kByte for HeapAlloc ZINIT 00200000h bytes (2 MB)
1057    cycles per kByte for HeapAlloc plain 00200000h bytes (2 MB)

612     cycles per kByte for malloc          00400000h bytes (4 MB)
609     cycles per kByte for GlobalAlloc     00400000h bytes (4 MB)
613     cycles per kByte for VirtualAlloc    00400000h bytes (4 MB)
594     cycles per kByte for HeapAlloc ZINIT 00400000h bytes (4 MB)
614     cycles per kByte for HeapAlloc plain 00400000h bytes (4 MB)

Tedd

Yup...


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)

Run A
        HAlloc  HAllocZ VAlloc  GAlloc  malloc
16k       32     174     868      40      52
32k       17     169     752      20      29
64k      657     601     685      17     661
128k      11     189     622      10     183
256k       8     185     645       7       9
512k     999     987    1001    1022     996
1M      1165    1153    1165    1173    1154
2M      1247    1249    1252    1236    1238
4M       590     591     600     592     586

Run B
        HAlloc  HAllocZ VAlloc  GAlloc  malloc
16k       26     174     889      50      50
32k       13     169     770      25      31
64k       14     602     686     682     686
128k       9     190     628      13     189
256k       6     187     595       9       9
512k     595     605     605     595     600
1M       800     811     790     785     786
2M      1057    1066    1091    1054    1054
4M       614     594     613     609     612


Abs. Diff.
        HAlloc  HAllocZ VAlloc  GAlloc  malloc
16k      6       0      21      10       2
32k      4       0      18       5       2
64k    643     1      1    665     25
128k      2       1       6       3       6
256k      2       2      50       2       0
512k   404    382   396    427   396
1M    365    342   375    388    368
2M    190    183   161    182    184
4M     24       3      13      17      26



Quote from: Tedd on December 30, 2012, 06:50:05 AM
To get more representative results: randomize the order of all allocation tests (mix up sizes and methods) and repeat the tests multiple times, then take average allocation times.
Potato2

jj2007

Tedd, you are perfectly right. I have no time to do it, but I think everybody will appreciate if you realise that idea.

frktons

Quote from: jj2007 on December 30, 2012, 08:09:54 AM
Tedd, you are perfectly right. I have no time to do it, but I think everybody will appreciate if you realise that idea.
Agreed. If Tedd posts these results it will be quite useful indeed.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

hutch--

I think from memory that anything under 64k is a waste of time due to page size in any memory allocation strategy. The other factor of course is that once you have memory allocated, they all perform the same, memory is memory. Apart from the normal strategies like Heap, Global and Virtual allocs, OLE string memory was useful as it was more closely connected to the OS and appeared to come from a different address range. It was a bit slower in allocation but performed fine once allocated.

sinsi

Don't forget things like page tables, TLB and the like.
Windows also has an idle thread that zeroes free pages, so ZERO_INIT times will be everywhere too.

Don57

Been a few years since I've use GlobalAlloc so checked the MS data sheets, and I guess MS considers them obsolete.

The global functions have greater overhead and provide fewer features than other memory management functions. New applications should use the heap functions unless documentation states that a global function should be used. For more information, see Global and Local Functions.