News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

GlobalAlloc, GlobalFree, Malloc and friends

Started by frktons, December 23, 2012, 05:23:05 AM

Previous topic - Next topic

frktons

Hi friends. Happy Holydays.

I'm trying to understand if it is possible, when allocating memory, to
have it aligned by 4-8-16. I didn't find any exaustive info so far.
And the second point I still don't have clear in my mind is when is it
important to deallocate/free the previously allocated memory?

Thanks for your help.

Frank
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

it's already aligned
i think by 16

if it were not...
you could allocate 15 extra bytes
store the returned address to be used by the Free function
then ADD 15 to it, then AND it with 0FFFFFFF0h and use that as an aligned access address

frktons

Quote from: dedndave on December 23, 2012, 05:53:18 AM
it's already aligned
i think by 16

if it were not...
you could allocate 15 extra bytes
store the returned address to be used by the Free function
then ADD 15 to it, then AND it with 0FFFFFFF0h and use that as an aligned access address

According to MSDN:
Quote
http://msdn.microsoft.com/en-us/library/aa366574(v=vs.85).aspx

says, inter alia


Memory allocated with this function is guaranteed to be aligned on an 8-byte boundary.
To execute dynamically generated code, use the VirtualAlloc function to allocate memory
and the VirtualProtect function to grant PAGE_EXECUTE access.



It seems the standard allocation is 8 aligned. And for 16 alignment
it is probably needed to use something like the one you suggested.
I'll make some tests with GlobalAlloc/HeapAlloc and see what alignment
I get.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Donkey

GlobalAlloc is aligned on 8 byte boundaries.

// Sorry frktons, I was typing while you answered....
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

frktons

Quote from: Donkey on December 23, 2012, 06:03:15 AM
GlobalAlloc is aligned on 8 byte boundaries.

// Sorry frktons, I was typing while you answered....

Thanks. No problem Edgar.
I recall that in an old program, we used GlobalAlloc to allocate
16 MB and after we accessed it with SSE2 instructions and XMM registers
that require 16 bytes alignment.
And it worked. I'm wondering why, then, it worked?
Shouldn't it throw a General Protection Fault?


      invoke GlobalAlloc,GMEM_ZEROINIT or GMEM_FIXED,16*1024*1024
      mov DataPtr, eax   
.....
mov edx, DataPtr
lea ecx, [edx+16000000]
mov eax, 20202020h
movd xmm0, eax
pshufd xmm0, xmm0, 0
                movdqa xmm1, xmm0
                movdqa xmm2, xmm0
                movdqa xmm3, xmm0
                movdqa xmm4, xmm0                                     

@@:
movdqa [edx], xmm0
movdqa [edx + 16], xmm1
movdqa [edx + 32], xmm2
movdqa [edx + 48], xmm3
movdqa [edx + 64], xmm4
add edx, 80
cmp edx, ecx
jl @B

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

OK. The test shows that the GlobalAlloc() API uses a 16 bytes
boundaries to allocate memory, at least on my system and
with a size of 16 MB.

;-----------------------------------------------------
; TestAlloc.asm
;-----------------------------------------------------
; Test 10 times the GlobalAlloc API and displays the
; alignment of allocated memory blocks
; frktons - 22-dec-2012
;-----------------------------------------------------

.nolist
include \masm32\include\masm32rt.inc
.686

.DATA

    DataPtr  dd 0

.CODE

start:

    mov ecx, 10

alloc_again:

    push ecx

    invoke GlobalAlloc,GMEM_ZEROINIT or GMEM_FIXED,16*1024*1024
    mov DataPtr, eax 

    and eax, 15

    .IF ( eax == 0)

        print " The allocated memory is 16 bytes aligned", 13, 10
        jmp  next_alloc

    .ENDIF

    mov eax, DataPtr
    and eax, 7

    .IF ( eax == 0)

        print " The allocated memory is 8 bytes aligned", 13, 10

    .ENDIF

next_alloc:

    invoke GlobalFree, DataPtr

    pop  ecx
    dec  ecx
    jnz  alloc_again   
   
end_test:
    inkey chr$(13, 10, "--- ok ---", 13)
    exit

end start


Quote
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned
The allocated memory is 16 bytes aligned

--- ok ---

Maybe if the memory to allocate is not a multiple of 16, it
uses the default 8 bytes alignment.
Another test is needed, I think.

If you test it on your system, would you confirm this data?
Thanks
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

Donkey

Hi frktons,

The docs only guarantee 8 byte alignment, that doesn't necessarily mean it will be 8 bytes, just that you cannot assume it will be greater than that. You're best to stay away from GlobalAlloc for anything that requires alignment anyway, VirtualAlloc is a faster and more flexible function.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

frktons

Quote from: Donkey on December 23, 2012, 06:43:48 AM
Hi frktons,

The docs only guarantee 8 byte alignment, that doesn't necessarily mean it will be 8 bytes, just that you cannot assume it will be greater than that. You're best to stay away from GlobalAlloc for anything that requires alignment anyway, VirtualAlloc is a faster and more flexible function.

Yes Edgar, that's probably a wise advice. Anyway if you make some tests
with Your favorite API please give me a shot.

I used a 1024*1024+15 bytes quantity to allocate, but I add the same result.
This make me suspect there is something else that the docs don't say.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

VirtualAlloc guarantees a 16-byte (actually: 4096-byte) alignment, but it is much slower than HeapAlloc for small allocations. Use HeapAlloc for many small non-SSE2 allocs and VirtualAlloc for the rest.

GlobalAlloc follows basically the same strategy, that's why you most probably get 16-byte alignment for fat chunks. But it is not documented...

frktons

Quote from: jj2007 on December 23, 2012, 07:37:04 AM
VirtualAlloc guarantees a 16-byte (actually: 4096-byte) alignment, but it is much
slower than HeapAlloc for small allocations. Use HeapAlloc for many small
non-SSE2 allocs and VirtualAlloc for the rest.

GlobalAlloc follows basically the same strategy, that's why you most probably get
16-byte alignment for fat chunks. But it is not documented...

Thanks Jochen.

How many KB is a small allocation?
I've read in MSDN that HeapAlloc is preferable to GlobalAlloc:
Quote
Note  The global functions have greater overhead and provide fewer
features than other memory management functions. New applications
should use the heap functions unless documentation states that a
global function should be used. For more information, see Global and
Local Functions.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

hutch--

Frank,

The only safe way with allocated memory is to allocate more than you need then align the front and set a pointer to it. Then you don't have to guess or hope that it will have the alignment you require on different OS versions.

jj2007

Quote from: frktons on December 23, 2012, 09:10:40 AM
How many KB is a small allocation?
I've read in MSDN that HeapAlloc is preferable to GlobalAlloc

We have made up a testbed some time ago, and the results are somewhat confusing. The MSDN rule of thumb seems to be ok but it depends on other factors, too. Test yourself...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
183     cycles per kByte for HeapAlloc       00004000h bytes (16 kB)
876     cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
40      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)

174     cycles per kByte for HeapAlloc       00008000h bytes (32 kB)
768     cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
20      cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)

1675    cycles per kByte for HeapAlloc       00010000h bytes (64 kB)
683     cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
211     cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)

192     cycles per kByte for HeapAlloc       00020000h bytes (128 kB)
648     cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
10      cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)

188     cycles per kByte for HeapAlloc       00040000h bytes (256 kB)
615     cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
7       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)

874     cycles per kByte for HeapAlloc       00080000h bytes (512 kB)
873     cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
881     cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)

1107    cycles per kByte for HeapAlloc       00100000h bytes (1 MB)
1107    cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
1107    cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)

qWord

MREAL macros - when you need floating point arithmetic while assembling!

frktons

Quote from: jj2007 on December 23, 2012, 09:34:52 AM

We have made up a testbed some time ago, and the results are somewhat confusing.
The MSDN rule of thumb seems to be ok but it depends on other factors, too. Test yourself...

Yea!
my testbed says also strange things:
Quote
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
277     cycles per kByte for HeapAlloc       00004000h bytes (16 kB)
1861    cycles per kByte for VirtualAlloc    00004000h bytes (16 kB)
92      cycles per kByte for GlobalAlloc     00004000h bytes (16 kB)

267     cycles per kByte for HeapAlloc       00008000h bytes (32 kB)
1555    cycles per kByte for VirtualAlloc    00008000h bytes (32 kB)
47      cycles per kByte for GlobalAlloc     00008000h bytes (32 kB)

321     cycles per kByte for HeapAlloc       00010000h bytes (64 kB)
1429    cycles per kByte for VirtualAlloc    00010000h bytes (64 kB)
27      cycles per kByte for GlobalAlloc     00010000h bytes (64 kB)

305     cycles per kByte for HeapAlloc       00020000h bytes (128 kB)
866     cycles per kByte for VirtualAlloc    00020000h bytes (128 kB)
11      cycles per kByte for GlobalAlloc     00020000h bytes (128 kB)

300     cycles per kByte for HeapAlloc       00040000h bytes (256 kB)
1219    cycles per kByte for VirtualAlloc    00040000h bytes (256 kB)
9       cycles per kByte for GlobalAlloc     00040000h bytes (256 kB)

1207    cycles per kByte for HeapAlloc       00080000h bytes (512 kB)
1205    cycles per kByte for VirtualAlloc    00080000h bytes (512 kB)
817     cycles per kByte for GlobalAlloc     00080000h bytes (512 kB)

820     cycles per kByte for HeapAlloc       00100000h bytes (1 MB)
823     cycles per kByte for VirtualAlloc    00100000h bytes (1 MB)
811     cycles per kByte for GlobalAlloc     00100000h bytes (1 MB)

Memory was touched

Probably from 1 MB upwards HeapAlloc becomes faster.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Quote from: hutch-- on December 23, 2012, 09:21:34 AM
Frank,

The only safe way with allocated memory is to allocate more than
you need then align the front and set a pointer to it. Then you don't
have to guess or hope that it will have the alignment you require on
different OS versions.

Yes Hutch, I'm considering this advice as a guideline for cases in
which it is not possible to have the certainity of the alignment.

Quote from: qWord on December 23, 2012, 09:35:56 AM
If allready using the CRT, _aligned_malloc may be also an option.

A small example could help me to see the possibilities of this.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama