Print Page - Aligning memory for later instructions.

Title: Aligning memory for later instructions.
Post by: hutch-- on August 25, 2016, 10:28:20 PM

Simple enough to do and necessary for XMM and YMM operations.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

include \masm32\include64\masm64rt.inc

.code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

LOCAL pMem :QWORD ; allocated memory pointer
LOCAL aMem :QWORD ; aligned memory pointer

padd equ <512> ; extra bytes (must be at least size of required alignment)
bcnt equ <1024*1024*4> ; 4 meg

mov pMem, alloc(bcnt+padd) ; allocate the memory plus padding
memalign rax, 256 ; align the memory up to the next 256 byte boundary
mov aMem, rax ; store result in aligned memory pointer

; do what you need with the 256 byte aligned memory (YMM addresses, register etc ....)

mfree pMem ; free the original allocated address

waitkey

invoke ExitProcess,0

ret

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

Title: Re: Aligning memory for later instructions.
Post by: K_F on September 09, 2016, 06:32:41 AM

Doing something similar with dynamic pointer allocations... but using bit/and comparisons ?

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 09, 2016, 06:51:43 AM

For the amount of memory usually needed for working efficiently with YMM regs, wouldn't VirtualAlloc be a good choice?

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 10, 2016, 04:29:35 PM

:biggrin:

Once memory is allocated, its all the same, whatever floats your boat. :P

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 11, 2016, 10:20:39 PM

Yes, but with VirtualAlloc you get the alignment "for free" ;-)

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 15, 2016, 03:16:03 PM

Another trick while tweaking the main include file, mis-align at least some structures and it crashes when the procedure that tries to use it is called. 8 byte alignment is necessary with most and probably all structures used in 64 bit.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 15, 2016, 08:27:03 PM

Quote from: hutch-- on September 15, 2016, 03:16:03 PM
Another trick while tweaking the main include file, mis-align at least some structures and it crashes when the procedure that tries to use it is called. 8 byte alignment is necessary with most and probably all structures used in 64 bit.

Indeed. The Zp switch is the way to go...

Quote from: jj2007 on September 13, 2016, 10:28:02 PMAnd that one works fine if you build it with Zp4 for 32-bit and Zp8 for 64-bit code, but it crashes for X64 and Zp4.

Which means that the default structure alignment of the Windows API is DWORD in 32-bit code and QWORD in 64-bit code. Both on a 64-bit processor, of course (see Reply #30); Redmond should take their documentation a bit more seriously 8)

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 15, 2016, 08:36:54 PM

:biggrin:

This is only a problem if you are trying to multiport similar assemblers. ML64 is NOT MASM compatible, it is only ML64 compatible, its error messages while organising include files are unintelligible and buggy and at time crashes with stack dumps. It does not have the tolerance that the old ML had and is a genuine joy to make working include files for structures and equates.

Now if you have a look at the guts of Japheth's h2incX output you will have genuine nightmares at this tangled mess of typedefs, prototypes, bugs, equates, the odd "tag" attached to the front of structures and with a possible delivery date at about the year 3000.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 15, 2016, 09:08:45 PM

Quote from: hutch-- on September 15, 2016, 08:36:54 PMif you have a look at the guts of Japheth's h2incX output you will have genuine nightmares at this tangled mess of typedefs, prototypes, bugs, equates, ...

The C++ fraction will insist that the 100+ types are necessary. IMHO the only real change from Windows.inc+WinExtra.inc is the distinction between "data" DWORDs (they can stay "as is") and "pointers" that are DWORDs in 32-bit code, and QWORDs in 64-bit code. In \Masm32\MasmBasic\Res\DualWin.inc the latter is called SIZE_P, and it's size depends on whether you build 32- or 64-bit code, obviously. Otherwise, there are only minor changes compared to Windows.inc+WinExtra.inc - a few structure members choked with ML64.

So, give DualWin.inc a try - no tangled mess of typedefs, prototypes, bugs, equates, just the old Windows.inc format.

Besides, it runs with all assemblers. So if you are fond of unintelligible and buggy error messages and stack dumps, use ML64, if instead you like the old .if eax>99 etc syntax, you can use the same include file with HJWasm.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 16, 2016, 08:50:56 AM

:biggrin:

> So if you are fond of unintelligible and buggy error messages and stack dumps, use ML64

I don't have the problem, I am not trying to use Japheth's includes. With ML64 I am free of "Open Sauce" licencing and the army of parasites that come with it. :badgrin:

Title: Re: Aligning memory for later instructions.
Post by: mineiro on September 17, 2016, 09:29:14 AM

Quote from: hutch-- on September 16, 2016, 08:50:56 AM
With ML64 I am free of "Open Sauce" licencing and the army of parasites that come with it. :badgrin:

Cmon sir hutch, tell us the true, you probably have a lot of softwares "open sauce" inside your computer, from pdf readers to disassemblers, just look to libraries.

But I take your point of view, I don't like open source too just because one thing: have lawyers inside. This is why I prefer public domain.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 17, 2016, 12:33:59 PM

Quote from: hutch-- on September 16, 2016, 08:50:56 AM
:biggrin:

> So if you are fond of unintelligible and buggy error messages and stack dumps, use ML64

I don't have the problem, I am not trying to use Japheth's includes. With ML64 I am free of "Open Sauce" licencing and the army of parasites that come with it. :badgrin:

Me neither. I am trying to use the standard Masm32 includes. Unfortunately, they make Microsoft ML64 crash with exceptions.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 17, 2016, 12:54:48 PM

That is because they were written for 32 bit ML.EXE, they are not compatible with ML64. I would not be doing the work if it was. Shortly I will have another tool that isolates the prototypes in the Microsoft vc2015 header files and that will be a method of creating prototypes for 64 bit assembler. I don't need it for ML64 but it will be useful for the assemblers that need prototypes. I doubt you can get an auto converter to work on the C .H files, they are too much of a tangled mess and too much useless noise but you can get equates, structures and prototypes which will ease the production of assembler include files.

I have attached a zip file with a C header file cleaner in it. A lot more needs to be done with it but it converts C hex to asm hex, removes the comments and a lot of the junk. Just drop a C header file onto it and it will produce a text file "cleaned.txt" that is at least readable.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 17, 2016, 06:07:30 PM

Quote from: hutch-- on September 17, 2016, 12:54:48 PM
That is because they were written for 32 bit ML.EXE, they are not compatible with ML64. I would not be doing the work if it was. Shortly I will have another tool that isolates the prototypes in the Microsoft vc2015 header files and that will be a method of creating prototypes for 64 bit assembler. I don't need it for ML64

You may compare your results to the attached \Masm32\MasmBasic\Res\pt.inc, which gets generated if somebody attempts to run the MasmBasic 64-bit examples.

Quote from: jj2007 on July 21, 2016, 04:58:40 PM
Hutch,

The error checking of invoke is relevant for this type of case:

Code Select Expand
invoke CreateFile, esi, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0, 0

And for nothing else. I guess we can safely assume that everybody can write their own proc with named parameters like hWnd and the like. If then somebody is thick enough to add a fifth parameter to the PROTO list, he should really go for Scratch or Logo 8)

Btw Microsoft's CrippleWare Assembler^TM can be convinced to count the paras. I am currently a bit stuck with the PROLOGUE bug (http://masm32.com/board/index.php?topic=5528.0), but with ML64 jinvoke works like a charm :icon_mrgreen:

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 18, 2016, 09:47:08 AM

This looks like an interesting technique.

j@BitBlt equ 2069/41:s111111111

Just a suggestion, instead of just supplying the argument count, use a 1 character notation for the data size.

q = QWORD
d = DWORD
w = WORD
b = BYTE

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 18, 2016, 10:20:57 AM

See e.g. j@VariantTimeToSystemTime equ 8777/89:s39
oleaut32.inc: VariantTimeToSystemTime PROTO :REAL8,:PTR SYSTEMTIME

Most args are DWORD in 32-bit code (even those that should be REAL4, see GdiPlus.inc). DWORDs can become SIZE_P for dual 64/32-bit assembly - the stack is organised in DWORD resp. QWORD slots, so even if (in 64-bit code) the arg should be "only" DWORD according to the C header file, it will do no harm to declare it a QWORD.

The conversion code starts in line 43ff of \Masm32\MasmBasic\Res\GetPT.asm

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 18, 2016, 12:15:31 PM

Interestingly enough, this works OK.

PPROTO TYPEDEF PTR PROC

MessageBox MACRO args:VARARG
externdef __imp_MessageBoxA:PPROTO
IF argcount(args) NE 4
echo *********************************************
echo MessageBox MACRO arg count error, 4 expected
echo *********************************************
.err
ENDIF
invoke __imp_MessageBoxA,args
ENDM

Its just that I could not be bothered, I have enjoyed working without prototypes.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 18, 2016, 07:23:09 PM

Erol and Paul had an alternative approach (http://www.masmforum.com/board/index.php?topic=8863.msg64295#msg64295), as you may remember:

Code Select

EXTERNDEF MessageBox@16:PROC
MessageBox EQU <invoke pr4 PTR MessageBox@16>

My version would be

Code Select

jinvoke MessageBox, 0, Str$(rax), Chr$("Title"), MB_OK

based on

Code Select

jd@130 equ user32
...
j@MessageBoxA equ 12837/130:s1111

where 12837 is a global counter, 130 is the ID of the DLL, and s1111 means stdcall, 4*DWORD (or QWORD in 64-bit)

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 18, 2016, 07:57:48 PM

This will build but will not start and does not recognise an extra argument. It is old left over code that worked on 32 bit ML using a macro I designed a long time ago that is still in the 32 bit windows.inc file.

EXTERNDEF MessageBox@16:PROC
MessageBox EQU <invoke pr4 PTR MessageBox@16>

This method of prototyping does not work in ML64.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 18, 2016, 08:27:38 PM

In the C++ header file this is the prototype.

WINOLEAUTAPI_(INT) VariantTimeToSystemTime(__in DOUBLE vtime, __out LPSYSTEMTIME lpSystemTime);
vtime is a DOUBLE
lpSystemTime is a QWORD pointer to a structure

Both are 64 bit values.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 18, 2016, 09:08:38 PM

Quote from: hutch-- on September 18, 2016, 08:27:38 PM
In the C++ header file this is the prototype.

WINOLEAUTAPI_(INT) VariantTimeToSystemTime(__in DOUBLE vtime, __out LPSYSTEMTIME lpSystemTime);
vtime is a DOUBLE
lpSystemTime is a QWORD pointer to a structure

Both are 64 bit values.

Indeed. Little test:

include \Masm32\MasmBasic\Res\JBasic.inc ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
.data?
MyR8 REAL8 ?
MyST SYSTEMTIME <>

Init
PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format")
jinvoke SetLastError, 0
jinvoke GetLocalTime, addr MyST
jinvoke SystemTimeToVariantTime, addr MyST, addr MyR8
jinvoke VariantTimeToSystemTime, MyR8, addr MyST
deb 4, "Result", eax, MyR8, MyST.wDay, MyST.wMonth, MyST.wYear
Inkey Err$()
EndOfCode

Output:

Code Select

This code was assembled with ml64 in 64-bit format
Result
eax     1
MyR8    42631.545960648
MyST.wDay       18
MyST.wMonth     9
MyST.wYear      2016

Operazione completata.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 18, 2016, 10:13:50 PM

Beware of being trapped in the past, long mode /LARGEADDRESSAWARE is the future and the native data size for Win64 is 64 bit.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 18, 2016, 11:02:15 PM

Quote from: hutch-- on September 18, 2016, 10:13:50 PM
Beware of being trapped in the past, long mode /LARGEADDRESSAWARE is the future and the native data size for Win64 is 64 bit.

Can you please elaborate the relevance of your statement for the VariantTimeToSystemTime example? Or for any other 64-bit example I ever posted here?

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 18, 2016, 11:48:17 PM

Quote
Most args are DWORD in 32-bit code (even those that should be REAL4, see GdiPlus.inc). DWORDs can become SIZE_P for dual 64/32-bit assembly - the stack is organised in DWORD resp. QWORD slots, so even if (in 64-bit code) the arg should be "only" DWORD according to the C header file, it will do no harm to declare it a QWORD.

According to the C++ header file, both values are 64 bit. If you want to write reliable code you will get the data sizes right. You may get away with 2 x 32 bit values but you are at risk with the future. The other test is if your code builds successfully with the linker option /LARGEADDRESSAWARE. If not you are tied to the past using 32 bit values.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 19, 2016, 12:50:12 AM

All my 64-bit code builds fine with /large.

Code Select

	int 3
	jinvoke VariantTimeToSystemTime, MyR8, addr MyST
	nop

translates to:

Code Select

CC                         | int3                                          |
48 8D 15 8C 21 00 00       | lea rdx, qword ptr ds:[140003268]             | pointer to SYSTEMTIME
48 8B 0D 7D 21 00 00       | mov rcx, qword ptr ds:[140003260]             | 40E4D0F6612F684C, double 42631.701273148
FF 15 9F 24 00 00          | call qword ptr ds:[<&VariantTimeToSystemTime> |
90                         | nop                                           |

Of course, with OPT_64 0, i.e. 32-bit assembly, there will be two DWORD pushes for the REAL8:

Code Select

CC                  int3
68 107BD400         push offset MyST
FF35 0C7BD400       push dword ptr [0D47B0C]
FF35 087BD400       push dword ptr [MyR8]
FF15 887CD400       call near [0D47C88]
90                  nop

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 19, 2016, 08:07:00 AM

The 64 bit code looks simple enough as its only 3 register args but there is no gain in using the wrong data size in the register when the spec is 64 bit. I don't know what the value is in the ml64 subforum for the 32 bit code, its been around since the middle 1990s.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 19, 2016, 08:30:13 AM

Where am I "using the wrong data size in the register when the spec is 64 bit"??? ::)

Title: Re: Aligning memory for later instructions.
Post by: MichaelW on September 19, 2016, 10:30:58 AM

Why not use the aligned malloc (https://msdn.microsoft.com/en-us/library/8z34s9c6.aspx) family of functions ? The attachment contains a demo done in C and GAS assembly.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 19, 2016, 10:45:41 AM

Michael,

You will probably like this.

; --------------------------------------------------------
; alignment must be an immediate operand and a power of 2
; when no longer required the original address must be
; freed with either GlobalFree() or the macro "mfree".
; --------------------------------------------------------
aalloc MACRO pmem:REQ,bcnt:REQ,alignment:REQ
mov rdx, bcnt
add rdx, alignment
mov rcx, GMEM_FIXED or GMEM_ZEROINIT
call GlobalAlloc
mov pmem, rax
add rax, alignment - 1
and rax, -alignment
EXITM <rax>
ENDM

Your suggestion is a good idea though, I have seen the function but have not had the time to try it out yet.

I have a longer version as well that will take memory operands for the alignment.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 19, 2016, 07:48:34 PM

Since HeapAlloc returns align 8, and align 16 is what you need for SIMD, a thin wrapper around HeapAlloc and HeapFree is another option. 8 bytes extra per call, obviously.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 19, 2016, 08:10:52 PM

Over time I have learnt that Microsoft have changed the default alignment of various memory allocation strategies so for reliable operation with whatever strategy you choose, manually controlling the memory alignment is the only safe technique. As per Michael's suggestion, the CRT aligned memory is a viable technique that does work OK for exactly the same reason, you can directly control the alignment and not make assumptions about what the default may happen to be.

For SSE you need 128 byte alignment, AVX requires 256 byte alignment and AVX2 512 byte alignment.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 19, 2016, 11:05:44 PM

Quote from: hutch-- on September 19, 2016, 08:10:52 PMOver time I have learnt that Microsoft have changed the default alignment of various memory allocation strategies

See screenshot below from the 1994 TechEd Conference (https://en.wikipedia.org/wiki/TechEd#Dates_and_Locations_of_TechEd_Events). M$ may have had good intentions, but (test attached) GlobalAlloc is align 8 on XP and Win7-64 alike, exactly as for HeapAlloc 8)

QuoteFor SSE you need 128 byte alignment

The great majority of SSE instructions is happy with align 16 or no alignment at all. Or did you mean 128 bits?

Title: Re: Aligning memory for later instructions.
Post by: nidud on September 19, 2016, 11:26:06 PM

deleted

Title: Re: Aligning memory for later instructions.
Post by: nidud on September 20, 2016, 12:50:18 AM

deleted

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 20, 2016, 01:05:16 AM

> The great majority of SSE instructions is happy with align 16 or no alignment at all. Or did you mean 128 bits?

This is the Intel manual.
The 128-bit (V)MOVNTDQA addresses must be 16-byte aligned or the instruction will cause a #GP.
The 256-bit VMOVNTDQA addresses must be 32-byte aligned or the instruction will cause a #GP.
The 512-bit VMOVNTDQA addresses must be 64-byte aligned or the instruction will cause a #GP.

This was a blunder, tired and too much work.
> For SSE you need 128 byte alignment, AVX requires 256 byte alignment and AVX2 512 byte alignment.

It should be,
For SSE you need 128 BIT alignment, AVX requires 256 BIT alignment and AVX2 512 BIT alignment.

Title: Re: Aligning memory for later instructions.
Post by: MichaelW on September 20, 2016, 03:07:56 AM

At least under Windows 7-64 and Windows 10-64, for the aligned malloc functions a 16-byte alignment is the minimum actual alignment. There are also the _aligned_offset_malloc functions that allow you to specify the alignment of a specific offset in the allocated memory. IIRC they were not supported under Windows XP, but are under Windows 7-64.

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 20, 2016, 09:18:05 AM

Quote from: hutch-- on September 20, 2016, 01:05:16 AMtired and too much work

Slow down, man. You are the Masm32 BDFL anyway, even if you don't finish the 64-bit version by tomorrow ;-)

Still 32-bit, almost plain HeapAlloc under the hood:

include \masm32\MasmBasic\MasmBasic.inc ; Version 20 September 2016 (http://masm32.com/board/index.php?topic=94.0)
Init
Dim PtrSSE() As DWORD
For_ ct=0 To A16Max-1 ; 100 aligned pointers
Alloc16 Rand(10000)
movaps [eax], xmm0 ; the proof ;-)
mov PtrSSE(ct), eax
Print Hex$(al), " "
Next
For_ ct=0 To A16Max-1
Free16 PtrSSE(ct)
Next
Inkey "OK?"
EndOfCode

Output:

Code Select

50 20 20 A0 80 40 70 30 10 20 70 90 F0 A0 30 20 00 20 50 50 B0 C0 50 50 40 80 F0 70 D0 B0 40 E0 A0 C0 30 70 10 F0 70 E0 80 20 C0 60 A0 E0 10
 00 70 10 D0 B0 00 90 20 B0 90 70 00 90 30 90 B0 30 00 60 C0 C0 10 10 B0 50 F0 60 C0 F0 B0 E0 10 90 C0 D0 F0 60 00 30 F0 A0 C0 A0 10 A0 90 3
0 80 A0 F0 E0 10 B0 OK?

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 20, 2016, 09:41:24 PM

I don't claim to understand your notation but if I have it right, why not make a version where you can set the alignment to any power of 2 size you like so you can also handle AVX and AVX2 ?

Title: Re: Aligning memory for later instructions.
Post by: nidud on September 22, 2016, 11:41:19 PM

deleted

Title: Re: Aligning memory for later instructions.
Post by: jj2007 on September 23, 2016, 09:32:53 AM

Quote from: nidud on September 22, 2016, 11:41:19 PM
Using the stack is way faster than using HeapAlloc.

That's correct, and StackBuffer() (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1255) proves it, but a HeapAlloc-based macro as shown above is normally fast enough, and not limited to the procedure where it was called.

Title: Re: Aligning memory for later instructions.
Post by: hutch-- on September 23, 2016, 10:36:26 AM

I generally choose dynamic memory allocation when I need large single memory blocks which I generally chop up into the size bits I need from it. I have seen code where massive counts of small allocations occur but its lousy code design and often very slow. Stack is easy and fast but I only use it for relatively small amounts, a few K here and there. You can alter the linker option on stack reserve/stack commit if you want a lot more stack space.

The MASM Forum

Microsoft 64 bit MASM => MASM64 SDK => Topic started by: hutch-- on August 25, 2016, 10:28:20 PM