News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

MSVCRT aligned malloc

Started by MichaelW, July 28, 2012, 05:46:26 PM

Previous topic - Next topic

MichaelW

The attachment is a test of the _aligned_malloc, _aligned_realloc, and _aligned_free functions from the later versions of MSVCRT.DLL. I found an appropriate version of MSVCRT.DLL on my Windows XP SP3 system, and for want of a better naming system I included the DLL version in the name of the import library.

http://msdn.microsoft.com/en-us/library/8z34s9c6(v=VS.80).aspx

Despite what the linked page states regarding compatibility, the MSVCRT.DLL on my Windows 2000 system does not export these functions.

BTW, on my Windows 2000 system the minimum alignment for malloc is 8, but on my Windows XP system it's apparently 16.

Well Microsoft, here's another nice mess you've gotten us into.

Vortex

Here is my test result. OS : XP SP3


385f00h 256
3862f0h 16
3866e0h 32
386ad0h 16

386ed0h 16
3872d0h 16
3876e0h 32
387ae0h 32
387ef0h 16
3882f0h 16
388700h 256
388b00h 256
388f10h 16
389310h 16

389740h 64
389b80h 128
389fc0h 64
38a3c0h 64
38a800h 2048
38ac40h 64
38b080h 128
38b4c0h 64
38b900h 256
38bd40h 64

Press any key to continue ...

jj2007

XP SP3 - seems they finally found out that SSE2 exists :icon_mrgreen:
333b70h
334f80h
335770h

333b60h 32
332430h 16
334f70h 16
335360h 32
335750h 16
335b40h 64
335f30h 16
336320h 32
336710h 16
336b00h 256

336f00h 256
337300h 256
337710h 16
337b10h 16
337f20h 32
338320h 32
338730h 16
338b30h 16
338f40h 64
339340h 64

339780h 128
339b80h 128
339fc0h 64
33a400h 1024
33a840h 64
33ac80h 128
33b0c0h 64
33b500h 256
33b940h 64
33bd40h 64

Gunther

Michael,

good job. Here are my results:


33af0h
34f00h
356f0h

33ae0h 32
34ef0h 16
352e0h 32
356d0h 16
35ac0h 64
35eb0h 16
362a0h 32
36690h 16
36a80h 128
36e70h 16

37270h 16
37670h 16
37a80h 128
37e80h 128
38290h 16
38690h 16
38aa0h 32
38ea0h 32
392b0h 16
396b0h 16

39ac0h 64
39f00h 256
3a340h 64
3a780h 128
3abc0h 64
3b000h 4096
3b440h 64
3b840h 64
3bc80h 128
3c0c0h 64


Gunther
You have to know the facts before you can distort them.

hutch--


332440h
3349a0h
335190h

332430h 16
334990h 16
334d80h 128
335170h 16
335560h 32
335950h 16
335d40h 64
336130h 16
336520h 32
336910h 16

336d10h 16
337110h 16
337520h 32
337920h 32
337d30h 16
338130h 16
338540h 64
338940h 64
338d50h 16
339150h 16

339580h 128
3399c0h 64
339e00h 512
33a200h 512
33a640h 64
33aa80h 128
33aec0h 64
33b300h 256
33b740h 64
33bb80h 128

Press any key to continue ...

jj2007

It seems I was wrong - it's HeapAlloc, not VirtualAlloc...

77BFC3B4     ³.  83C6 0F                 add esi, 0F
77BFC3B7     ³.  83E6 F0                 and esi, FFFFFFF0
77BFC3BA     ³>  56                      push esi                                ; ÚSize
77BFC3BB     ³.  6A 00                   push 0                                  ; ³Flags = 0
77BFC3BD     ³.  FF35 1824C377           push dword ptr [77C32418]               ; ³Heap = 00330000
77BFC3C3     ³.  FF15 F410BE77           call near [<&KERNEL32.HeapAlloc>]       ; ÀNTDLL.RtlAllocateHeap
77BFC3C9     ³>  E8 8DB00000             call 77C0745B
77BFC3CE     À.  C3                      retn


KeepingRealBusy

Here is my new laptop OS Win 7:

7c2510h
7c3920h
7c4110h

7c2500h 256
7c3910h 16
7c3d00h 256
7c40f0h 16
7c44e0h 32
7c48d0h 16
7c4cc0h 64
7c50b0h 16
7c54a0h 32
7c5890h 16

7c5c90h 16
7c6090h 16
7c64a0h 32
7c68a0h 32
7c6cb0h 16
7c70b0h 16
7c74c0h 64
7c78c0h 64
7c7cd0h 16
7c80d0h 16

7c8500h 256
7c8940h 64
7c8d80h 128
7c9180h 128
7c95c0h 64
7c9a00h 512
7c9e40h 64
7ca280h 128
7ca6c0h 64
7cab00h 256

qWord

Win7, x64
232a20h
233e30h
234620h

232a18h 8
233e28h 8
234218h 8
234608h 8
2349f8h 8
234de8h 8
2351d8h 8
2355c8h 8
2359b8h 8
235da8h 8

2361a0h 32
2365b0h 16
2369b0h 16
236dc0h 64
2371c0h 64
2375d0h 16
2379d0h 16
237de0h 32
2381e0h 32
2385f0h 16

238a00h 512
238e40h 64
239280h 128
2396c0h 64
239b00h 256
239f40h 64
23a340h 64
23a780h 128
23abc0h 64
23b000h 4096

Press any key to continue ...
MREAL macros - when you need floating point arithmetic while assembling!

jj2007

232a18h 8

*Schenkelklopfen* :greensml:

One more "pleasant" surprise in Win 7-64 (see the trashes xmm regs thread)

sinsi

Weird results

running from within winrar
2329d0h
233de0h
2345d0h

2329c8h 8
230f00h 256
233dd8h 8
2341c8h 8
2345b8h 8
2349a8h 8
234d98h 8
235188h 8
235578h 8
235968h 8

235d60h 32
236170h 16
236570h 16
236980h 128
236d80h 128
237190h 16
237590h 16
2379a0h 32
237da0h 32
2381b0h 16

2385c0h 64
238a00h 512
238e40h 64
239280h 128
2396c0h 64
239b00h 256
239f00h 256
23a340h 64
23a780h 128
23abc0h 64

running from desktop
332950h
333d60h
334550h

332940h 64
330f00h 256
333d50h 16
334140h 64
334530h 16
334920h 32
334d10h 16
335100h 256
3354f0h 16
3358e0h 32

335ce0h 32
3360e0h 32
3364f0h 16
3368f0h 16
336d00h 256
337100h 256
337510h 16
337910h 16
337d20h 32
338120h 32

338540h 64
338980h 128
338dc0h 64
339200h 512
339640h 64
339a40h 64
339e80h 128
33a2c0h 64
33a700h 256
33ab40h 64

jj2007

I am using gsl_matrix_alloc and would like to speed up its operations a little bit with movaps. In short: I need a 16-byte aligned matrix.

Under the hood, gsl_matrix_alloc is using malloc, so I googled once again hoping to get clear information. Here it is:

1. Visual Studio 2005, malloc Alignment:
Quotemalloc is required to return memory on a 16-byte boundary.

Great, short and crispy, and exactly what I wanted to know :t
But....

2. Visual Studio 2012, malloc Alignment:
Quotemalloc is guaranteed to return memory that's aligned on a boundary that's suitable for storing any object that could fit in the amount of memory that's allocated. For example, a four-byte allocation would be aligned on a boundary that supports any four-byte or smaller object. Memory alignment on a boundary that's suitable for a larger object than will fit in the allocation is not guaranteed.

WTF does that M$ gobbledygook, pardon: "documentation", mean? If my "object" is a matrix of REAL8s, then malloc will align on an 8-byte boundary? Then thanks, M$, for forcing me to use movups :eusa_boohoo:

P.S.: Question from a "power poster" at CodeGuru:

> consider that you want to allocate memory block exactly in an address that is divisible by 16.
We know this already. The question is -- why do you want to do this?

FORTRANS

Hi,

   As predicted, it did not run on Windows 2000.  Ran it three times
on Windows XP.  The first run was different from the other two.

Regards,

Steve N.


Microsoft Windows XP [Version 5.1.2600]

A:\>test
3337e0h
334bf0h
3353e0h

3323a8h 8
3337d0h 16
334be0h 32
334fd0h 16
3353c0h 64
3357b0h 16
335ba0h 32
335f90h 16
336380h 128
336770h 16

336b70h 16
336f70h 16
337380h 128
337780h 128
337b90h 16
337f90h 16
3383a0h 32
3387a0h 32
338bb0h 16
338fb0h 16

3393c0h 64
339800h 2048
339c40h 64
33a080h 128
33a4c0h 64
33a900h 256
33ad40h 64
33b140h 64
33b580h 128
33b9c0h 64

Press any key to continue ...

A:\>test
3337f0h
334c00h
3353f0h

3323a8h 8
3337e0h 32
334bf0h 16
334fe0h 32
3353d0h 16
3357c0h 64
335bb0h 16
335fa0h 32
336390h 16
336780h 128

336b80h 128
336f80h 128
337390h 16
337790h 16
337ba0h 32
337fa0h 32
3383b0h 16
3387b0h 16
338bc0h 64
338fc0h 64

339400h 1024
339800h 2048
339c40h 64
33a080h 128
33a4c0h 64
33a900h 256
33ad40h 64
33b180h 128
33b5c0h 64
33b9c0h 64

Press any key to continue ...

A:\>test
3337f0h
334c00h
3353f0h

3323a8h 8
3337e0h 32
334bf0h 16
334fe0h 32
3353d0h 16
3357c0h 64
335bb0h 16
335fa0h 32
336390h 16
336780h 128

336b80h 128
336f80h 128
337390h 16
337790h 16
337ba0h 32
337fa0h 32
3383b0h 16
3387b0h 16
338bc0h 64
338fc0h 64

339400h 1024
339800h 2048
339c40h 64
33a080h 128
33a4c0h 64
33a900h 256
33ad40h 64
33b180h 128
33b5c0h 64
33b9c0h 64

Press any key to continue ...

A:\>

dedndave

    mov     edx,esp
    sub     edx,<sizeof desired buffer>
    and     dl,-16
    .repeat
        push    eax
        mov     esp,fs:[8]
    .until edx>=esp
    mov     esp,edx
;EDX = ESP = 16-aligned buffer address

TouEnMasm

                Intel(R) Celeron(R) CPU 2.80GHz
Microsoft Windows XP Home Edition Build Service Pack 3 2600


Quote
34010h
35420h
35c10h

34008h  8
35418h  8
35808h  8
35bf8h  8
35fe8h  8
363d8h  8
367c8h  8
36bb8h  8
36fa8h  8
37398h  8

37790h  16
37ba0h  32
37fa0h  32
383b0h  16
387b0h  16
38bc0h  64
38fc0h  64
393d0h  16
397d0h  16
39be0h  32

3a000h  8192
3a440h  64
3a880h  128
3acc0h  64
3b0c0h  64
3b500h  256
3b940h  64
3bd80h  128
3c1c0h  64
3c600h  512

Press any key to continue ...
Fa is a musical note to play with CL

MichaelW

Since malloc has no way of knowing what you are storing in the buffer, to conform to my interpretation of the statement it would have to base the alignment on the specified size. Under Windows XP SP3, whether I link with MSVCRT or LIBC (MSVC++ Toolkit 2003), malloc does not align based on the size, and for _aligned_malloc the alignment is not optional and specifying zero results in an alignment of 4.

Perhaps a later version of MSVCRT or LIBC will produce different results.

I did not have a Microsoft import library for MSVCRT, so I used the one from the attachment at the top of this thread, renamed to msvcrt.lib and placed in my working directory (included in attachment).
Well Microsoft, here's another nice mess you've gotten us into.