Author Topic: Testing Code Align64  (Read 1170 times)

Siekmanski

  • Member
  • *****
  • Posts: 1145
Testing Code Align64
« on: April 11, 2017, 11:21:44 AM »
I would ask you guys if you want to run this code align64 test program and post the results.
jj2007 and nidud created this macro to align code to a 64 byte boundary.

Align64 MACRO
LOCAL num_nops
  num_nops = 64-($-_TEXT) and 63
   if num_nops
      db num_nops dup(90h)
     endif
ENDM

Because I didn't get the right results, jj2007 and I tested it with different combinations of Masm assemblers and linkers and sadly, with different results.
To use this very useful macro I changed it to my needs and made a test program to see if I can use it with my assembler and linker combination.

Marinus

Code: [Select]
Code Align64 test.

Program Entry Point: 004010A0, 64 byte alignment: 32

Memory_1: 0040110F aligned to: 00401140  alignment: 00 '49 NOP(S) inserted.'
Memory_2: 0040126E aligned to: 00401280  alignment: 00 '18 NOP(S) inserted.'
Memory_3: 00401552 aligned to: 00401580  alignment: 00 '46 NOP(S) inserted.'
Memory_4: 00401C80 aligned to: 00401C80  alignment: 00 '00 NOP(S) inserted.'

Press any key to continue...

jj2007

  • Member
  • *****
  • Posts: 7758
  • Assembler is fun ;-)
    • MasmBasic
Re: Testing Code Align64
« Reply #1 on: April 11, 2017, 12:32:42 PM »
Code: [Select]
Program Entry Point: 004010A0, 64 byte alignment: 32

Memory_1: 0040110F aligned to: 00401140  alignment: 00 '49 NOP(S) inserted.'
Memory_2: 0040126E aligned to: 00401280  alignment: 00 '18 NOP(S) inserted.'
Memory_3: 00401552 aligned to: 00401580  alignment: 00 '46 NOP(S) inserted.'
Memory_4: 00401C80 aligned to: 00401C80  alignment: 00 '00 NOP(S) inserted.'

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4934
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Testing Code Align64
« Reply #2 on: April 11, 2017, 03:04:31 PM »
I must be a barbarian here, since the Core2 series of Intel hardware, almost exclusively code alignment does not matter and often when I have aligned labels the algo is slower. Data is another matter, if you want speed, data must be aligned. In 64 bit code the stack must be aligned to an interval of 16 and at least with MASM that is easy enough, the latest version of my prologue code (which I have not posted yet) is user defined with any alignment of intervals of 16 .
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

jj2007

  • Member
  • *****
  • Posts: 7758
  • Assembler is fun ;-)
    • MasmBasic
Re: Testing Code Align64
« Reply #3 on: April 11, 2017, 06:41:04 PM »
I must be a barbarian here, since the Core2 series of Intel hardware, almost exclusively code alignment does not matter and often when I have aligned labels the algo is slower.

True. In my experience, align 2 or align 4 before a loop can help a little bit, but not always. Before a proc, I always align 16 hoping that the code cache is being used more efficiently - and more specifically, to put timings on a comparable basis.

For the latter, align 64 would be nice, but so far the linker plays foul. I am curious to see what Marinus cooks up.

Siekmanski

  • Member
  • *****
  • Posts: 1145
Re: Testing Code Align64
« Reply #4 on: April 12, 2017, 06:42:52 AM »
Found it... at least I think.
I had more than 1 code section in my program, and that messed it up.
So afterall it is a reliable macro if you take notice of that fact.

jj2007

  • Member
  • *****
  • Posts: 7758
  • Assembler is fun ;-)
    • MasmBasic
Re: Testing Code Align64
« Reply #5 on: April 12, 2017, 07:08:40 AM »
So what happens if you use a library??

Siekmanski

  • Member
  • *****
  • Posts: 1145
Re: Testing Code Align64
« Reply #6 on: April 12, 2017, 08:17:59 AM »
I'm writing a speed test now for code-loops smaller or equal to 64 bytes to fit in 1 code cache line.
To see if we can benefit of the Align64 macro.

But the linker again messed things up and the program entry moved up 16 bytes.  :icon_eek: ( only by writing more code to it. )
I can still use the macro, by changing this line:

 num_nops = 64-($-_TEXT) and 63
to:
 num_nops = 48-($-_TEXT) and 63

Later I will test it in a library.

Siekmanski

  • Member
  • *****
  • Posts: 1145
Re: Testing Code Align64
« Reply #7 on: April 12, 2017, 09:15:44 AM »
Here is a test piece with source code to check out the Align64 macro.
It can be used to align code to 64 bytes for the code cache to execute it faster.
As an example I have used a code loop from my FFT routine to test if it runs faster with it.
And it does !  :eusa_dance:

Code: [Select]
Align64 speed test by Siekmanski.

ProgramEntry: 00401050
BufferStart:  003D0000
CodeAlign64 test:   00

Timing starts now:

1124 Cycles for Test_Align4
1102 Cycles for Test_Align16
1023 Cycles for Test_Align64

jj2007

  • Member
  • *****
  • Posts: 7758
  • Assembler is fun ;-)
    • MasmBasic
Re: Testing Code Align64
« Reply #8 on: April 12, 2017, 10:20:43 AM »
About 4% faster, compared to align 16, and +7% compared to align 4 :t
Code: [Select]
Align64 speed test by Siekmanski.

ProgramEntry: 00401050
BufferStart:  00250000
CodeAlign64 test:   00

Timing starts now:

1281 Cycles for Test_Align4
1216 Cycles for Test_Align16
1168 Cycles for Test_Align64

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4934
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Testing Code Align64
« Reply #9 on: April 12, 2017, 12:24:40 PM »
There is an option in a COFF object module to align the code in the module but its not easy to get at. With the data module tool I have in MASM32 "fda.exe" its a simple option where you just include a number as long as it is a power of 2 and it works fine. I guess it would require a dedicated tool to modify an existing object module but that may be a way to get reliable code alignment greater than 16 bytes.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

Mikl__

  • Member
  • ****
  • Posts: 556
Re: Testing Code Align64
« Reply #10 on: April 12, 2017, 12:31:24 PM »
Hi, Siekmanski!
align 64 is ((-$+_TEXT)and 63)

Siekmanski

  • Member
  • *****
  • Posts: 1145
Re: Testing Code Align64
« Reply #11 on: April 12, 2017, 02:24:10 PM »
jj2007, it seems we can use it to speed things up.  :t

Hi Hutch,
That's an option, but would be a whole workaround. With this macro you can adjust it to the program entry point when the program is ready to be released and use it just like align 4 and 16. If assembled and linked it works.

But the most important thing is, is it faster on your PC than align 16 ?

Hi Mikl__
((-$+_TEXT)and 63) gives this: error A2071:initializer magnitude too large for specified size

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4934
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Testing Code Align64
« Reply #12 on: April 12, 2017, 02:59:24 PM »
Marinus,

Same results with multiple tests.


Align64 speed test by Siekmanski.

ProgramEntry: 00401050
BufferStart:  005D0000
CodeAlign64 test:   00

Timing starts now:

832 Cycles for Test_Align4
819 Cycles for Test_Align16
775 Cycles for Test_Align64

Press any key to continue...
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

Siekmanski

  • Member
  • *****
  • Posts: 1145
Re: Testing Code Align64
« Reply #13 on: April 12, 2017, 03:02:08 PM »
Thanks.  :eusa_dance:

TWell

  • Member
  • ****
  • Posts: 748
Re: Testing Code Align64
« Reply #14 on: April 12, 2017, 03:46:12 PM »
Cheap AMD
Code: [Select]
Align64 speed test by Siekmanski.

ProgramEntry: 00401050
BufferStart:  00360000
CodeAlign64 test:   00

Timing starts now:

887 Cycles for Test_Align4
777 Cycles for Test_Align16
736 Cycles for Test_Align64