Assuming the OS is a 64 bit windows (not 32 bit)
Is it safe to assume that HeapAlloc will return memory aligned to 16 bytes?
Maybe for 64-bit code, but for 32-bit code it's still 8 bytes.
In my quick test, coded as a 64-bit app using Pelles C and running under Windows7-64, the alignment was 32 bits, strangely enough, but for higher alignments there are the _aligned_malloc and _aligned_offset_malloc functions, which should be readily callable from assembly code.
Edit: After my liver has had time to process the bottle of wine I drank, make that 32 bytes, and in further testing sometimes 16.
alignment.asm:
;----------------------------------
; poasm /AAMD64 /Gr alignment.asm
;----------------------------------
.CODE
_alignment PROC PARMAREA = 40
xor rax, rax ; prep for return zero
bsf rcx, rcx ; scan passed pointer from bit 0 for first set bit
jz @F ; return if no set bit
mov rax, 1 ; set bit 0
shl rax, cl ; shift left by index of first set bit
@@:
ret
_alignment ENDP
END
#include <windows.h>
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
//#include <malloc.h> // Per Pelles IDE "Use <stdlib.h> instead of non-standard <malloc.h>"
int _cdecl main(void)
{
int _alignment(void *);
void *p1 = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 1000000);
printf("%d\n", _alignment(p1));
void *p2 = _aligned_malloc(1000000, 128);
printf("%d\n", _alignment(p2));
HeapFree(GetProcessHeap(), 0, p1);
_aligned_free(p2);
_getch();
return(0);
}
32
256
Sorry, didn't notice this q. b4.
Agner Fog has this to say on p.120 of his manual "Optimizing C++", 2013 (u can google it)
Quote12.8 Aligning dynamically allocated memory
Memory allocated with new or malloc is typically aligned by 8 rather than by 16. This is a problem with vector operations when alignment by 16 is required. The Intel compiler has solved this problem by defining _mm_malloc and _mm_free.
This statement should apply to 64 bit Windows since the document always mentions any differences between 32 - 64 OS's. Of course he could have missed it; but he's saying it's NOT safe to make your assumption. (No doubt new / malloc is calling HeapAlloc.) MichaelW's test indicates it's always aligned to 16; I don't suppose Pelles is doing that under the hood?
There's also the famous document "How to use Pageheap.exe in Windows XP, Win 2000, and Server 2003", an MS support page. I assume you're familiar with it, it's referenced all over the place. It says:
QuoteThe Windows heap managers (all versions) have always guaranteed that the heap allocations have a start address that is 8-byte aligned (on 64-bit platforms the alignment is 16-bytes).
It also points out that this alignment can be circumvented:
QuoteNOTE: Some programs make assumptions about 8-byte alignment and they stop working correctly with the /unaligned parameter. Microsoft Internet Explorer is one such program.
But this doc is not official MS dogma, and it's pretty old.
As long as I'm on the topic, my notes from a while ago say this:
Quote...MSDN GlobalAlloc is the ONLY function that mentions alignment:
"Memory allocated with this function is guaranteed to be aligned on an 8-byte boundary." (applies to 32 and 64)
GlobalAlloc is strongly related to HeapAlloc. On a related page:
Quote"Starting with 32-bit Windows, the global and local functions are implemented as wrapper functions that call the corresponding heap functions using a handle to the process's default heap."
I should re-check this but 2 lazy; my notes are probably correct. And, somewhere I saw that the minimum memory that can be allocated with 32-bit Windows is 8 bytes, but with 64-bit it's 16; didn't note where. That might imply (if you're optimistic) that heap alignment also changed from 8 to 16.
Putting it all together MS definitely does NOT guarantee 16-byte alignment (64-bit Win) but it may well be provided. But to be safe I would use _aligned_malloc, as MichaelW suggests, or similar.
Its trivial to align memory, what's the big deal ?
I'm just dumping everything my notes say on the subject - one of many subjects. If you're doing dynamic mem allocation with XMM it can be important, especially using lib functions u can't directly control (YMM less problem unaligned). As usual the solution is hand-code in assembler to get exactly what u want ... also don't use dynamic allocation! I was at first but no longer, use data? with "align 16" (as appropriate) instead, use same space for multiple structs when they don't conflict, re-use mem, put stuff on the stack, whatever's convenient. Eliminated various hard-to-trace bugs that cropped up with my amateur alloc'ing. But what do I know? - some people need aligned alloc, I suppose. "What's the big deal?" - don't ask me, ask an expert - e.g. yourself :biggrin:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
comment * -----------------------------------------------------
Build this template with
"CONSOLE ASSEMBLE AND LINK"
----------------------------------------------------- *
memalign MACRO reg, number
add reg, number - 1
and reg, -number
ENDM
alignby equ <64>
.code
start:
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
call main
inkey
exit
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
main proc
LOCAL hMem :DWORD
LOCAL pAlg :DWORD
mov hMem, alloc(8192+alignby)
mov eax, hMem
memalign eax, alignby
mov pAlg, eax ; pointer to aligned memory
print str$(pAlg),13,10
free hMem
ret
main endp
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
end start
There is also StackBuffer() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1255), which is guaranteed to be aligned for use with SIMD. As the name says, it uses the stack, and up to half a megabyte it is faster than HeapAlloc.
http://www.masmforum.com/board/index.php?topic=16837.msg140127#msg140127
Quote from: sinsi on February 07, 2015, 10:51:19 PM
http://www.masmforum.com/board/index.php?topic=16837.msg140127#msg140127
http://support.microsoft.com/kb/286470
Quoteon 64-bit platforms the alignment is 16-bytes
(Redmond speaking)
include \masm32\include\masm32rt.inc
.code
start: xor ebx, ebx
.Repeat
print str$(ebx), 9
invoke HeapAlloc, rv(GetProcessHeap), 0, 4
test al, 15
.if !Zero?
print hex$(eax), 9, "FOUL", 13, 10
.else
print hex$(eax), 9, "OK", 13, 10
.endif
inc ebx
.Until ebx>40
exit
end start
Win7-
64:
15 002B3B50 OK
16 002B3B60 OK
17 002D50A8 FOUL
18 002D50D0 OK
19 002D50E0 OKSo the correct quote for our friends in Redmond should be:
Quoteon 64-bit platforms the alignment is 16-bytes, most of the time 8)
out of curiosity....
what happens if you allocate sizes that are multiples of page size (4 KB) ?
i guess the heap is a collection of pages already allocated
I read that to mean 64-bit code...
Quote from: sinsi on February 08, 2015, 06:39:09 AM
I read that to mean 64-bit code...
"platform" is indeed a little bit ambiguous. Maybe somebody can test it with 64-bit code?
Allocating 55 bytes at a time
0000000000363D40
0000000000363D80
0000000000363DC0
0000000000363E00
0000000000363E40
0000000000363E80
0000000000363EC0
0000000000363F00
0000000000363F40
0000000000363F80
000000000039DE70
000000000039DEE0
000000000039DF20
000000000039DF60
000000000039DFA0
000000000039DFE0
000000000039E020
000000000039E060
000000000039E0A0
000000000039E0E0
000000000039E120
000000000039E160
000000000039E1A0
000000000039E1E0
000000000039E220
000000000039E260
000000000039E2A0
000000000039E2E0
000000000039E320
000000000039E360
000000000039E3A0
000000000039E3E0
@sinsi, jj2007 - proving once again, long posts don't get read :P I quoted that doc above (found it on the thread sinsi ref'ed). As I said, it's 2003, and NOT official dogma - just a casual aside in a tutorial for XP. Doesn't apply to modern OS's. U can't trust such a ref.
@hutch - you're still wondering, who cares? when (as you show) it's a trivial problem. The word u may be looking for is "pedantic". When it comes to pedantry, I'm an oldbie
Quote from: rrr314159 on February 08, 2015, 09:19:06 AMjust a casual aside in a tutorial for XP. Doesn't apply to modern OS's. U can't trust such a ref.
It's a Microsoft KB, so you are saying "you can't trust Microsoft". I agree :biggrin:
So it implies you have to roll your own alloc if you want to use XMM regs in 32-bit code. Which is a bad idea anyway because xmm regs are being merciless destroyed by Win32 API calls such as HeapAlloc/ReAlloc/Free and MessageBox. And don't tell me "but that's documented in the ABI" unless you are able to provide a link to an official Microsoft page with the x86 ABI mentioning XMM regs ;-)
Maybe I suffer from a simple mind (KISS principle) but if you allocate and align memory then plonk it into a pointer, who cares what happens to XMM registers after that. If you start passing around XMM registers as parameters you risk running into Microsoft not caring what happens to them. Much the same with FP registers, they get trashed routinely. I tend to use registers in a temporary manner wich avoids such problems.
Steve,
The only people who care for 16-bit alignment of buffers are those who use SIMD regularly. One can assume they don't like their active xmm regs being trashed by a bloody MessageBox. And saving 5 xmm regs is an issue, performance-wise; not for the MessageBox, of course, but there are also fast APIS. I remember we argued about uses esi edi ebx in WndProc ;-)
:biggrin:
Well, the simple answer is DON'T save them, abuse them like Microsoft do. :P