News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

HASM 2.32 Release

Started by johnsa, May 16, 2017, 09:11:28 PM

Previous topic - Next topic

Raistlin

One way to economize trace cache use is to avoid loop unrolling :icon_eek: :icon_exclaim:

OK did'nt see that !- thanks as always JJ2007 - seems counter intuitive as this was the bread and butter
of optimization until now.  Is there a graphical (DIAGRAM) way to represent such ? Just for understanding.....

[EDIT: Never mind it all makes sense - just had to read it twice]
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

jj2007

Quote from: Raistlin on May 24, 2017, 04:30:50 PMseems counter intuitive as this was the bread and butter of optimization until now.

Unrolling helps every now and then, in case of doubt: time it :P

My experience is mixed: Sometimes a very short loop can be made a bit faster, but equally often it gets slower.

Apparently shorter code is good for battery life (AnandTech):
QuoteSandy Bridge introduced the 1.5K µop cache that caches decoded micro-ops. When future instruction fetch requests are made, if the instructions are contained within the µop cache everything north of the cache is powered down and the instructions are serviced from the µop cache. The decode stages are very power hungry so being able to skip them is a boon to power efficiency.

Siekmanski

@ Raistlin

Here I tested a piece of looped code that fits in 1 cache line on a 64 byte aligned code memory boundary. It executes faster.
http://masm32.com/board/index.php?topic=6141.msg65172#msg65172
Creative coders use backward thinking techniques as a strategy.

johnsa

Quote from: nidud on May 24, 2017, 06:39:14 AM
A bit long. /Sa is used.

We change the maximum alignment of Segments with the n value required to be a power of two which specifies the new alignment in bytes also used as pack(n) in C for structure alignment.


/Zp[n] Set structure alignment
/Sp[n] Set segment alignment


So, Segment Packing maybe?

That works for me, I'm happy with /Sp.
I think what we should do also , is if align 32 (>16) is used in the section and /Sp is not specified or the align is > that the /Sp value.. emit a warning ?

nidud

#64
deleted

jj2007

Relics from old DOS times? JWasm issues a warning, yes, HJWasm doesn't, but both produce correct code.

include \masm32\include\masm32rt.inc

.data
db "123", 0
align 64
MyArray db 25, 18, 23, 17, 9, 2, 6
align 64
HelloW$ db "Hello World", 0

.code
start:
  mov ecx, offset MyArray
  print hex$(ecx), 9, "MyArray", 13, 10

  mov ecx, offset HelloW$
  inkey hex$(ecx), 9, "HelloW$", 13, 10

  exit

end start

OPT_Assembler JWasm
OxPT_Assembler AsmC
OPT_linker linkv614

nidud

#66
deleted

hutch--

You may find this stuff a bit long in the tooth. The worlds has changed multiple times since the PIV. Core2 and the later "i" series hardware does different things. Humerously enough unrolling does not work like it used to on old processors but not through such a limitation, its due to the internal logic getting better.

nidud

#68
deleted

aw27

Quote from: jj2007 on May 24, 2017, 08:27:56 PM
Relics from old DOS times? JWasm issues a warning, yes, HJWasm doesn't, but both produce correct code.
Also with most recent MS linker  :eusa_clap:. The problem is that not everybody produce standalone Assembly language programs these days, actually almost nobody does.
This is what happens when I run a program linking a similar module assembled with the .data + align 32 combo with a HLL like Delphi or Free Pascal:


Because the Delphi and FPC linkers understand the full segment definition syntax, probably they consider what JWASM/UASM are doing is just a hack.
You see, Masm issues an error, it is only the MS Linker that let the bad things pass through the raindrops.


johnsa

I've applied the same command line switch as Nidud's post to Uasm 2.34 now.. Will update site etc. once i've run some more testing on other changes.


TWell

#71
Quote from: aw27 on May 24, 2017, 10:52:10 PM
This is what happens when I run a program linking a similar module assembled with the .data + align 32 combo with a HLL like Delphi or Free Pascal:


Because the Delphi and FPC linkers understand the full segment definition syntax, probably they consider what JWASM/UASM are doing is just a hack.
You see, Masm issues an error, it is only the MS Linker that let the bad things pass through the raindrops.
Interesting.
With this code:
foo.asm.386
.model flat, c
public foo1
public foo2
public foo3
.data
db "123", 0
align 64
foo1 db 25, 18, 23, 17, 9, 2, 6
align 64
foo2 db 2
foo3 db 2
end
program test;
{$LINK foo.obj}

type pchar = ^char;
function printf(fmt : pchar):longint; cdecl; varargs;  external 'msvcrt';

var 
  foo1 : pchar; cvar; external;
  foo2 : pchar; cvar; external;
  foo3 : pchar; cvar; external;

begin
PrintF('Hello Free Pascal'#10);
PrintF('ptr: %p'#10, @foo1);
PrintF('ptr: %p'#10, @foo2);
PrintF('ptr: %p'#10, @foo3);
end.

Hello Free Pascal
ptr: 00402040
ptr: 00402080
ptr: 00402081
ppc386.exe

EDIT: now with uasm64.exe -Sp32
program test;
{$LINK AVXTest1.obj}
{$LINK AVXTest2.obj}

type pchar = ^char;
function printf(fmt : pchar):longint; cdecl; varargs;  external 'msvcrt';

var 
  AbsMask1 : pchar; cvar;  external;
  NegMask1 : pchar; cvar; external;
  AbsMask2 : pchar; cvar; external;
  NegMask2 : pchar; cvar; external;
  foo1:char =#0;

begin
PrintF('ptr: %p'#10, @foo1);
PrintF('ptr: %p'#10, @AbsMask1);
PrintF('ptr: %p'#10, @NegMask1);
PrintF('ptr: %p'#10, @AbsMask2);
PrintF('ptr: %p'#10, @NegMask2);
end.
.686
.model flat, stdcall
option casemap:none

public AbsMask1
public NegMask1

.data
AbsMask1 dword 7fffffffh,7fffffffh,7fffffffh,7fffffffh, 7fffffffh,7fffffffh,7fffffffh,7fffffffh
NegMask1 dword 80000000h,80000000h,80000000h,80000000h, 80000000h,80000000h,80000000h,80000000h

end
.686
.model flat, stdcall
option casemap:none

public AbsMask2
public NegMask2

_DATA1 SEGMENT ALIGN(32) FLAT 'DATA' ALIAS('.data')
AbsMask2 dword 7fffffffh,7fffffffh,7fffffffh,7fffffffh, 7fffffffh,7fffffffh,7fffffffh,7fffffffh
NegMask2 dword 80000000h,80000000h,80000000h,80000000h, 80000000h,80000000h,80000000h,80000000h
_DATA1 ends

end
ptr: 00402000
ptr: 00402020
ptr: 00402040
ptr: 00402060
ptr: 00402080

aw27

Quote from: TWell on May 25, 2017, 02:10:51 AM
Interesting.
Good you have FPC. :t

Try the code in attachment changing in AVX.ASM
.data
align 32

instead of
_DATA1 SEGMENT ALIGN(32) FLAT 'DATA' ALIAS('.data')
AbsMask dword 7fffffffh,7fffffffh,7fffffffh,7fffffffh, 7fffffffh,7fffffffh,7fffffffh,7fffffffh
NegMask dword 80000000h,80000000h,80000000h,80000000h, 80000000h,80000000h,80000000h,80000000h
_DATA1 ends

and tell if it works  :bgrin:

johnsa

update available to test:

http://www.terraspace.co.uk/uasm64.zip

1) The new segment alignment switch is in, -Sp so you can see if that helps once linked in to HLL, you can just go back to using standard simplified directives .data .data? etc.

2) AW27, you can also re-check BND/MPX stuff, I found a bug with BND CALL, it was hardcoded to assume a 2 byte instruction when calculating the displacement, obviously with a BND prefix it becomes 3 so that fix is in..

Let me know how it goes.
John

aw27

Quote from: johnsa on May 25, 2017, 04:51:47 AM
1) The new segment alignment switch is in, -Sp so you can see if that helps once linked in to HLL, you can just go back to using standard simplified directives .data .data? etc.
I don't notice any benefit (the ASM module is from the attachment of my previous answer):
1- With new switch -Sp32 or -Sp



2-Using complete segment directives:



Conclusion: 1) does not align to 32-byte


Quote
2) AW27, you can also re-check BND/MPX stuff, I found a bug with BND CALL, it was hardcoded to assume a 2 byte instruction when calculating the displacement, obviously with a BND prefix it becomes 3 so that fix is in..
Does not crash, but I don't know what is the reason  :badgrin: