Author Topic: Aligning memory for later instructions.  (Read 20471 times)

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Aligning memory for later instructions.
« Reply #30 on: September 19, 2016, 08:10:52 PM »
Over time I have learnt that Microsoft have changed the default alignment of various memory allocation strategies so for reliable operation with whatever strategy you choose, manually controlling the memory alignment is the only safe technique. As per Michael's suggestion, the CRT aligned memory is a viable technique that does work OK for exactly the same reason, you can directly control the alignment and not make assumptions about what the default may happen to be.

For SSE you need 128 byte alignment, AVX requires 256 byte alignment and AVX2 512 byte alignment.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Aligning memory for later instructions.
« Reply #31 on: September 19, 2016, 11:05:44 PM »
Over time I have learnt that Microsoft have changed the default alignment of various memory allocation strategies

See screenshot below from the 1994 TechEd Conference. M$ may have had good intentions, but (test attached) GlobalAlloc is align 8 on XP and Win7-64 alike, exactly as for HeapAlloc 8)

Quote
For SSE you need 128 byte alignment

The great majority of SSE instructions is happy with align 16 or no alignment at all. Or did you mean 128 bits?

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Aligning memory for later instructions.
« Reply #32 on: September 19, 2016, 11:26:06 PM »
deleted
« Last Edit: February 25, 2022, 12:03:06 PM by nidud »

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Aligning memory for later instructions.
« Reply #33 on: September 20, 2016, 12:50:18 AM »
deleted
« Last Edit: February 25, 2022, 12:03:16 PM by nidud »

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Aligning memory for later instructions.
« Reply #34 on: September 20, 2016, 01:05:16 AM »
> The great majority of SSE instructions is happy with align 16 or no alignment at all. Or did you mean 128 bits?

This is the Intel manual.
The 128-bit (V)MOVNTDQA addresses must be 16-byte aligned or the instruction will cause a #GP.
The 256-bit VMOVNTDQA addresses must be 32-byte aligned or the instruction will cause a #GP.
The 512-bit VMOVNTDQA addresses must be 64-byte aligned or the instruction will cause a #GP.

This was a blunder, tired and too much work.
> For SSE you need 128 byte alignment, AVX requires 256 byte alignment and AVX2 512 byte alignment.

It should be,
For SSE you need 128 BIT alignment, AVX requires 256 BIT alignment and AVX2 512 BIT alignment.


hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1196
Re: Aligning memory for later instructions.
« Reply #35 on: September 20, 2016, 03:07:56 AM »
At least under Windows 7-64 and Windows 10-64, for the aligned malloc functions a 16-byte alignment is the minimum actual alignment. There are also the _aligned_offset_malloc functions that allow you to specify the alignment of a specific offset in the allocated memory. IIRC they were not supported under Windows XP, but are under Windows 7-64.
Well Microsoft, here’s another nice mess you’ve gotten us into.

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Aligning memory for later instructions.
« Reply #36 on: September 20, 2016, 09:18:05 AM »
tired and too much work

Slow down, man. You are the Masm32 BDFL anyway, even if you don't finish the 64-bit version by tomorrow ;-)

Still 32-bit, almost plain HeapAlloc under the hood:

include \masm32\MasmBasic\MasmBasic.inc      ; Version 20 September 2016
  Init
  Dim PtrSSE() As DWORD
  For_ ct=0 To A16Max-1      ; 100 aligned pointers
      Alloc16 Rand(10000)
      movaps [eax], xmm0      ; the proof ;-)
      mov PtrSSE(ct), eax
      Print Hex$(al), " "
  Next
  For_ ct=0 To A16Max-1
      Free16 PtrSSE(ct)
  Next
  Inkey "OK?"
EndOfCode


Output:
Code: [Select]
50 20 20 A0 80 40 70 30 10 20 70 90 F0 A0 30 20 00 20 50 50 B0 C0 50 50 40 80 F0 70 D0 B0 40 E0 A0 C0 30 70 10 F0 70 E0 80 20 C0 60 A0 E0 10
 00 70 10 D0 B0 00 90 20 B0 90 70 00 90 30 90 B0 30 00 60 C0 C0 10 10 B0 50 F0 60 C0 F0 B0 E0 10 90 C0 D0 F0 60 00 30 F0 A0 C0 A0 10 A0 90 3
0 80 A0 F0 E0 10 B0 OK?

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Aligning memory for later instructions.
« Reply #37 on: September 20, 2016, 09:41:24 PM »
I don't claim to understand your notation but if I have it right, why not make a version where you can set the alignment to any power of 2 size you like so you can also handle AVX and AVX2 ?
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

nidud

  • Member
  • *****
  • Posts: 2388
    • https://github.com/nidud/asmc
Re: Aligning memory for later instructions.
« Reply #38 on: September 22, 2016, 11:41:19 PM »
deleted
« Last Edit: February 25, 2022, 12:03:28 PM by nidud »

jj2007

  • Member
  • *****
  • Posts: 13957
  • Assembly is fun ;-)
    • MasmBasic
Re: Aligning memory for later instructions.
« Reply #39 on: September 23, 2016, 09:32:53 AM »
Using the stack is way faster than using HeapAlloc.

That's correct, and StackBuffer() proves it, but a HeapAlloc-based macro as shown above is normally fast enough, and not limited to the procedure where it was called.

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 10583
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Aligning memory for later instructions.
« Reply #40 on: September 23, 2016, 10:36:26 AM »
I generally choose dynamic memory allocation when I need large single memory blocks which I generally chop up into the size bits I need from it. I have seen code where massive counts of small allocations occur but its lousy code design and often very slow. Stack is easy and fast but I only use it for relatively small amounts, a few K here and there. You can alter the linker option on stack reserve/stack commit if you want a lot more stack space.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy: