News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

HASM 2.32 Release

Started by johnsa, May 16, 2017, 09:11:28 PM

Previous topic - Next topic

jj2007

Quote from: johnsa on May 23, 2017, 06:11:01 PM
I've never test align 32 under 32bit to be honest, it's always worked flawlessly for me on 64bit. I would think, that each section would be loaded into memory by the OS loader on page boundaries (4096).

Attention, section <> segment. Although,  ::) ->see attachment (plain asm).

Quote from: aw27 on May 23, 2017, 07:16:41 PMWe are concerned with the default data alignment within a section not the whole section itself. The default is paragraph (16 bytes). To change that behaviour we can not use the simplified directive

That's the whole point indeed. When using align 32 in a .data? segment, MASM chokes because this is above the default alignment of 16. Question is why the Watcom family doesn't complain.

johnsa

Well intra-section alignment should just work if the start of the section is aligned >= 32, then you should be able to just use align X (for any X <= 4096 assuming page aligned).
I will run some more tests in 32bit to see how that works as it's definitely fine in 64bit. Are you also running on a 32bit OS ? (The OS loader may also affect this as opposed to running a 32bit app on a 64bit os).

nidud

#47
deleted

nidud

#48
deleted

johnsa

I guess we can just change in simseg.c to write out align(32) instead of para, as it seems like a perfectly legitimate option under all conditions?

nidud

#50
deleted

johnsa

I guess we could do that too,
It just seems like yet another setting you have to change to obtain the desired behaviour when using align 32. It doesn't seem to be affecting the outcome in 64bit, so I assume PARA or not the whole section is in memory on a 64kb or page boundary at least.

So.. command line switch or the proposed workaround posted here? - I'll go with whatever we decide here (in the spirit of what we discussed earlier!) :)

aw27

Quote from: TWell on May 23, 2017, 06:25:27 PM
With:MbData SEGMENT ALIGN(32) ALIAS('.data')

This appears to be a good point, particularly when linking with HLL, because will prevent multiple section names.
"
ALIAS( string )
This string is used as the section name in the emitted COFF object. Creates multiple sections with the same external name, with distinct MASM segment names.
"

In summary:
_DATA1 SEGMENT ALIGN(32) FLAT 'DATA' ALIAS('.data')



johnsa

Quote from: jj2007 on May 23, 2017, 07:23:07 PM
Quote from: johnsa on May 23, 2017, 06:11:01 PM
I've never test align 32 under 32bit to be honest, it's always worked flawlessly for me on 64bit. I would think, that each section would be loaded into memory by the OS loader on page boundaries (4096).

Attention, section <> segment. Although,  ::) ->see attachment (plain asm).

Quote from: aw27 on May 23, 2017, 07:16:41 PMWe are concerned with the default data alignment within a section not the whole section itself. The default is paragraph (16 bytes). To change that behaviour we can not use the simplified directive

That's the whole point indeed. When using align 32 in a .data? segment, MASM chokes because this is above the default alignment of 16. Question is why the Watcom family doesn't complain.

JWasm did complain, we removed the warning about align 32, as I use it all the time for AVX stuff (but 64bit only), so the segment's para setting seems to have little effect in 64bit as I've not once had an issue with align 32 on 64bit.

johnsa

So,

command line option -align:X    set segment alignment
default is para(16) if not specified,  and we'll just apply that to simseg ?

nidud

#55
deleted

jj2007

Quote from: johnsa on May 24, 2017, 01:27:44 AMIt just seems like yet another setting you have to change to obtain the desired behaviour when using align 32.

Indeed. There are people who try to remain compatible with the "old" Masm world, and adding new commandline options can be an obstacle to switching.

Just found another incompatibility. This builds fine in ML 8.0 ... 10.0:

include \masm32\include\masm32rt.inc      ; plain Masm32
MbData SEGMENT align(32) 'data'      ; MSDN: SEGMENT
myvarA      dd ?
align 32
myvarB      dd ?
; MbData ENDS

.code
      nop

MbData segment
align 32
myvarC      dd ?     

other SEGMENT ALIGN(32)
align 32
myvarO      dd ?     
other ENDS

.code
start:
  mov eax, offset myvarB
  sub eax, offset myvarA
  print hex$(eax), "h bytes difference B-A", 13, 10
  mov eax, offset myvarC
  sub eax, offset myvarA
  print hex$(eax), "h bytes difference C-A", 13, 10
  mov eax, offset myvarO
  sub eax, offset myvarA
  inkey hex$(eax), "h bytes difference O-A", 13, 10
  exit
end start

hutch--

This is a bit off topic but its close to content. It has been many years since I have used full segments in a MASM executable so I had to test if they still worked in 64 bit MASM. Seems so. Here is a test piece that does nothing but test if a 4096 byte aligned ymm variable is writable.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    aln4096 SEGMENT Align(4096)
      align 4096
      .ymm0 ymmword ?
      .ymm1 ymmword ?
      .ymm2 ymmword ?
      .ymm3 ymmword ?
      .ymm4 ymmword ?
      .ymm5 ymmword ?
      .ymm6 ymmword ?
      .ymm7 ymmword ?
      .ymm8 ymmword ?
      .ymm9 ymmword ?
      .ymmA ymmword ?
      .ymmB ymmword ?
      .ymmC ymmword ?
      .ymmD ymmword ?
      .ymmE ymmword ?
      .ymmF ymmword ?
    aln4096 ENDS

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL ymmvar    :QWORD

    mov ymmvar, 1234567890

    vmovntdqa ymm0, YMMWORD PTR [rcx+r10]
    vmovntdq .ymm0, ymm0

    waitkey
    invoke ExitProcess,0

    ret

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end


Raistlin

I found some non-scholarly articles on the alignment issue re: what it is, why we need it, best practice, implications etc.

Here's the links, perhaps it provides elucidation.

https://msdn.microsoft.com/en-us/library/Aa290049(VS.71).aspx

http://www.gamasutra.com/view/feature/3942/data_alignment_part_1.php?print=1

http://www.agner.org/optimize/optimizing_assembly.pdf

http://technion.ac.il/doc/intel/compiler_c/main_cls/intref_cls/common/intref_alignment_support.htm#intref_alignment_support

http://www.obpm.org/download/Intro_to_Intel_AVX.pdf
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

jj2007

Thanks for the links. The Agner doc has 331 instances of the word align, but most concern the 'forced' alignments through 64-bit ABI and SIMD instructions. No real info on performance penalties, except for ... the 'modern' P4 processor:
QuoteThe performance penalty for level-1 cache line contention can be quite considerable on older microprocessors, but on newer processors such as the P4 we are losing only a few clock cycles because the data are likely to be prefetched from the level-2 cache, which is accessed quite fast through a full-speed 256 bit data bus. The improved efficiency of the level-2 cache in the P4 compensates for the smaller level-1 data cache.

On a side note (related to occasional debates whether being stingy about code size is good or bad):
Quote11.3 μop cache
The Intel Sandy Bridge and later processors have a traditional code cache before the decoders and a μop cache after the decoders. The μop cache is a big advantage because instruction decoding is often a bottleneck on Intel processors. The capacity of the μop cache is much smaller than the capacity of the level-1 code cache. The μop cache is such a critical resource that the programmer should economize its use and make sure the critical part of the code fits into the μop cache. One way to economize trace cache use is to avoid loop unrolling.