Author Topic: Confusion about Architecture Selection (AVX or SSE)  (Read 641 times)

johnsa

  • Member
  • ****
  • Posts: 590
    • Uasm
Confusion about Architecture Selection (AVX or SSE)
« on: September 02, 2017, 06:48:01 AM »
Hi all,

In a recent conversation with a UASM regular user there was some confusion over why it was generating executables that wouldn't run on machines without AVX support.. So I thought I'd copy the detail here to serve as an explanation / reminder to anyone who may run into this:

Ok.. so to give you some background here as to what is going on.. UASM/ASMC/JWASM all generate a lot of code for invoke/prologue/epilogue on procs etc. Even more now so in UASM with it’s more advanced prologue/epilogue and macro library.

Traditionally ALL the instructions were generated as SSE (ASMC and JWASM). The problem however then arises where if you write a procedure that uses AVX/AVX2 or AVX512 there is a massive penalty from transition between SSE and AVX modes. To reduce this penalty you can insert VZEROALL or VZEROUPPER instructions to avoid the state change costing thousands of cycles.

The problem was that under some arrangements with SSE as the default / used in prologue there was no opportunity for the programmer to insert these to avoid the penalty, or in others you might simply forget and have no idea why the code is so slow.

Because of this we totally re-worked ALL that code to work more like a fully-fledged compiler (like VC/GCC etc) which gives you an option.

We have OPTION ARCH:SSE and OPTION ARCH:AVX which control this. Depending on that setting all proc/invoke/prologue/epilogue and macro library built-in functionality will use the corresponding instruction set, so you can switch that back and forth in your code as much as you like depending on your requirements for instruction set.

In your case where you need the code to run on machines with SSE only and no AVX support you should add OPTION ARCH:SSE either to the code or it can be specified on the command line via switch.

OPTION ARCH:AVX was determined to be the best default, but with the command line switch or OPTION directive it’s entirely up to you.
If you add that then you should get MOVQ instead of the AVX equivalent VMOVQ.

The command line switches are listed when you use -?
They are:

-archSSE OR –archAVX

You can use the OPTION multiple times in code without restriction, so you could wrap sets of SSE and AVX functions in them to provide different execution paths or library calls etc.

John

johnsa

  • Member
  • ****
  • Posts: 590
    • Uasm
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #1 on: September 04, 2017, 07:26:20 PM »
Just to let you know, we have uploaded an update to 2.39 dated today 4th September.
All that has changed is the default architecture is now SSE instead of AVX to maximise default compatibility.

The OPTION ARCH and command line switches work as before.
So if you wanted to generate AVX opcodes in invoke/prologue/epilogue you'd explicitly enable it with OPTION ARCH:AVX or -archAVX on the command line.

John

aw27

  • Member
  • ****
  • Posts: 857
  • Let's Make ASM Great Again!
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #2 on: September 04, 2017, 10:11:47 PM »
 :t

habran

  • Member
  • *****
  • Posts: 1116
    • uasm
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #3 on: September 05, 2017, 12:25:55 AM »
 :biggrin:
There is one more thing that is added in that last build and John forgot to include in Extended Manual:
 OPTION SWITCHSIZE:SIZE   which we limited to 8000h  and default is 4000h
The purpose is to give to a programmer the choice to choose between speed or size
usage is E.G.:
  OPTION SWITCHSIZE:2000h
the mechanic is in hll.c:
Code: [Select]
.....
swsize = 0x4000;
.....
          if (ModuleInfo.Ofssize == USE32 || hll->csize == 4) {
            bubblesort(hll, hll->plabels, hll->pcases, hll->casecnt);
            if ((hll->delta * 4) <= (hll->casecnt * 4 + hll->casecnt * 2))
              hll->cflag = 6;                   /* we need only jump table */
            else if (hll->delta < 256)
              hll->cflag = 4;                   /* we need both jump table and count table byte size */
            else if (hll->delta < swsize)       /* size limited to 0x4000  */
              hll->cflag = 7;                   /* we need both jump table and count table word size */
            else
              hll->cflag = 5;                   /* we will use a binary tree */
            }

The Samples folder in both 32 and 64 bit contains switch32.asm which gives example of each version of cases
and explanation what it produces


« Last Edit: September 05, 2017, 05:56:11 AM by habran »
Cod-Father

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #4 on: September 06, 2017, 12:30:56 AM »
This crashed with the latest build of Uasm (v2.39):
Code: [Select]

    .x64
    .model  flat, fastcall

    option  dllimport:<msvcrt>
    printf  proto :ptr byte, :vararg
    exit    proto :qword

    .data
    error  db "Uasm Error: %d",10,0

    .code

sw_uasm proc val

    .switch ecx

    enum = 0
    repeat 300
%   .case @CatStr(%enum)
    mov eax,enum
    enum = enum + 1
    endm

    enum = 600
    repeat 60
%   .case @CatStr(%enum)
    mov eax,enum
    enum = enum + 1
    endm

    enum = 1000
    repeat 1000
%   .case @CatStr(%enum)
    mov eax,enum
    enum = enum + 1
    endm

    .default
        xor eax,eax

    .endswitch
    ret

sw_uasm endp

main proc

    mov esi,299
    .while esi
        invoke sw_uasm,esi
        .if eax != esi
            invoke printf,addr error,esi
            .break
        .endif
        dec esi
    .endw
    mov esi,659
    .while esi >= 600
        invoke sw_uasm,esi
        .if eax != esi
            invoke printf,addr error,esi
            .break
        .endif
        dec esi
    .endw
    mov esi,1999
    .while esi >= 1000
        invoke sw_uasm,esi
        .if eax != esi
            invoke printf,addr error,esi
            .break
        .endif
        dec esi
    .endw
    mov edi,1000
    .while edi
        mov esi,2000
        .while esi
            invoke sw_uasm,esi
            dec esi
        .endw
        dec edi
    .endw
    invoke exit,0

main endp

    end main

johnsa

  • Member
  • ****
  • Posts: 590
    • Uasm
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #5 on: September 06, 2017, 06:40:57 PM »
Habran is investigating the switch issue. Update asap.

habran

  • Member
  • *****
  • Posts: 1116
    • uasm
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #6 on: September 09, 2017, 06:16:42 AM »
Thanks for pointing the bug nidud  :t
It was hard to find why was that happening but it is fixed now as well as improved the speed. I believe it will be slightly faster than ASMC with the less data than ASMC
because in that particular case it kicks in option 6, in which only jump table is created without using word counter. It creates less data in this case if we insert default (or exit if not default) : .default = @C0559
 @C0001, @C0002,@C0003,@C0004,@C0559,@C0006,@C0559,@C0559,@C0009...

UASM 32 bit even more often find out that it takes less data if used option 6 instead of option 7 because of DWORD jump table.

John will upload soon the UASM-2.40 with that and some more fixes 8)
Cod-Father

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #7 on: September 09, 2017, 07:41:44 AM »
Well, the crash-bug is below 2048. However, counts above only works with unsigned values so this fails:
Code: [Select]
    .x64
    .model  flat, fastcall

    option  dllimport:<msvcrt>
    printf  proto :ptr byte, :vararg
    exit    proto :qword

    .data
    error  db "Uasm Error: %d, %d",10,0

    .code

sw_uasm proc val

    .switch ecx

    enum = 0
    repeat 300
%   .case @CatStr(%enum)
    mov eax,enum
    enum = enum - 1
    endm

    enum = 600
    repeat 60
%   .case @CatStr(%enum)
    mov eax,enum
    enum = enum + 1
    endm

    enum = 6000
    repeat 2000
%   .case @CatStr(%enum)
    mov eax,enum
    enum = enum + 1
    endm

    .default
        xor eax,eax

    .endswitch
    ret

sw_uasm endp

main proc

    mov esi,-299
    .while esi
        invoke sw_uasm,esi
        .if eax != esi
            invoke printf,addr error,esi,eax
            .break
        .endif
        inc esi
    .endw
    mov esi,659
    .while esi >= 600
        invoke sw_uasm,esi
        .if eax != esi
            invoke printf,addr error,esi,eax
            .break
        .endif
        dec esi
    .endw
    mov esi,7999
    .while esi >= 6000
        invoke sw_uasm,esi
        .if eax != esi
            invoke printf,addr error,esi,eax
            .break
        .endif
        dec esi
    .endw
    mov edi,1000
    .while edi
        mov esi,8000
        .while esi
            invoke sw_uasm,esi
            dec esi
        .endw
        dec edi
    .endw
    invoke exit,0

main endp

    end main

Code: [Select]
?_0001: cmp     eax, 4294966997                         ; 5C20 _ 3D, FFFFFED5
        jl      @C0941                                  ; 5C25 _ 7C, F3
        cmp     eax, 7999                               ; 5C27 _ 3D, 00001F3F
        ja      @C0941                                  ; 5C2C _ 77, EC
        lea     r10, [@C0006]                           ; 5C2E _ 4C: 8D. 15, 00000000(rel)
        sub     eax, -299                               ; 5C35 _ 2D, FFFFFED5
        movzx   r10, word ptr [r10+rax*2]               ; 5C3A _ 4D: 0F B7. 14 42
        lea     rax, [@C0004]                           ; 5C3F _ 48: 8D. 05, 00000000(rel)
        jmp     qword ptr [rax+r10*8]                   ; 5C46 _ 42: FF. 24 D0

jj2007

  • Member
  • *****
  • Posts: 7758
  • Assembler is fun ;-)
    • MasmBasic
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #8 on: September 09, 2017, 10:47:30 AM »
    invoke exit,0

New syntax?
Code: [Select]
exit
exit error
invoke ExitProcess, error

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #9 on: September 09, 2017, 11:37:13 AM »
New syntax?

 :biggrin:

Well, now you see what happens when you slip up and forget to add the standard syntax:

include \masm32\MasmBasic\MasmBasic.inc ; download version 6 September

Then this happens:
the smally.vcxproj tried to open Visual Studio, and that took a while :greensml:

I like Erol's code. The Pelles C IDE opens in 3 seconds, it builds in 2 seconds, and it doesn't complain about any errors :t

And now this:

C Standard Library Reference Tutorial

C library function - exit()

jj2007

  • Member
  • *****
  • Posts: 7758
  • Assembler is fun ;-)
    • MasmBasic
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #10 on: September 09, 2017, 12:45:52 PM »
Well, now you see what happens when you slip up and forget to add the standard syntax

Dear nidud,

Really, you shouldn't post here when you are drunk! Just in case you haven't noticed: This is not a C forum, so "standard syntax" here is either exit (the Masm32 macro) or invoke ExitProcess, 0 (Windows API).

Obviously you don't have the faintest idea what spaghetti code really is. MasmBasic definitely doesn't fall into that category.

Btw, I have often praised AsmC for its speed, and I am still using it now and then. But if you keep adding all those crappy "improvements", AsmC will become a confused wannabe assembler, and the worst C compiler developed in the 21st century :icon_mrgreen:

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 4935
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #11 on: September 09, 2017, 02:41:54 PM »
 :biggrin:

There is nothing wrong with spaghetti code as long as you use the right sauce. It seems its been easy to forget what an assembler is. At its core it screws user selected mnemonics together to create object modules that with other compatible object modules can be linked together to make executable code. There is good reason why you assist that use of higher level code like .IF, Switch blocks, some loop operators and the like as modern assembler must use external high level code to interact with the operating system but once you shift past that you are writing a compiler.

Instead of political correctness on how you write code with the restrictions of the poorly designed loop code of so many modern compilers, assembler allows you to write from simple one dimension loops to complex variations like multi nested conditional loops, crossfire loops, multitudes of interdependent loops and all you need to do is know how to write them.

With an assembler you are free of strong typing that even C compilers need to bypass from time to time and you will note that you still have "goto" to solve the problems of crappy loop design when you need to exit a location without the assumptions of one dimensional loop methods.

Now the notion of "standard syntax" certainly has its place but only in the context of a formalised standard for a compiler, C89, c99 and whatever comes after it but it has no place with an assembler where you have the power to write anything you know how to do without the code being crippled by junky assumptions. I see nothing wrong with trying to write a C compiler but it is an entirely different animal to an assembler.

A C compiler and a compatible assembler are not competitors but complimentary object module creators, while at least some C code is supplemented with assembler modules, it is equally as easy to use an object module created in C to either test out or use a C algorithm in an assembler application. About all you have to do is have a good look at it as mnemonic code and once you get over the YUK factor of the tangled mess you discover, if there are any gains to be had you tweak it to make it faster.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :biggrin:

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #12 on: September 10, 2017, 12:21:45 AM »
Dear nidud,

 :biggrin:

Quote
Really, you shouldn't post here when you are drunk!

If I was drunk then this would actually be true.

Quote
Just in case you haven't noticed: This is not a C forum, so "standard syntax" here is either exit (the Masm32 macro) or invoke ExitProcess, 0 (Windows API).

This sub-forum is dedicated to the development of UASM which is written in C so you shouldn't be all that surprised if C code is frequently posted here as part of this development.

Quote
Obviously you don't have the faintest idea what spaghetti code really is. MasmBasic definitely doesn't fall into that category.

Right.

Quote
Btw, I have often praised AsmC for its speed, and I am still using it now and then.

That's fine and I do appreciate all the testing you've done to achieve that. But as for using the assembler, thought it is more stable now, this statement still stands:

Quote
In general terms the HLL section should be able to eliminate all labels to prevent any "spaghetti" jumps. For this to be possible it should have at least the same possibilities as C and the current implementation is even more flexible with regards to control flow. The aim must then be to remove restrictions if possible.

To achieve this the break statement is now removed from the switch and replaced with endc. So to write any code based on the HLL section in ASMC may not be a good idea since the whole concept may be rewritten at any time.

Quote
But if you keep adding all those crappy "improvements", AsmC will become a confused wannabe assembler

I assume you refer to the anti-spaghetti implementations in this case and I sort of understand your point of view in that regard. I do however think many (if not all) of these arguments are more based on emotions or being a true Scotsman than logic and reason. I do however (for some reason) find that somewhat amusing.

Quote
and the worst C compiler developed in the 21st century :icon_mrgreen:

Given I have a rather conservative view of the debate regarding the assembler/compiler issue, a more correct statement would probably be the second worst C compiler. Needless to say these emotional outbursts are not very constructive.

johnsa

  • Member
  • ****
  • Posts: 590
    • Uasm
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #13 on: September 10, 2017, 02:30:58 AM »

Quote
and the worst C compiler developed in the 21st century :icon_mrgreen:

Given I have a rather conservative view of the debate regarding the assembler/compiler issue, a more correct statement would probably be the second worst C compiler. Needless to say these emotional outbursts are not very constructive.
What's the first worst C compiler ? :)

nidud

  • Member
  • *****
  • Posts: 1411
    • https://github.com/nidud/asmc
Re: Confusion about Architecture Selection (AVX or SSE)
« Reply #14 on: September 10, 2017, 02:58:56 AM »
 :biggrin:

The one who adds the most crappy improvements apparently.