News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Confusion about Architecture Selection (AVX or SSE)

Started by johnsa, September 02, 2017, 06:48:01 AM

Previous topic - Next topic

johnsa

Hi all,

In a recent conversation with a UASM regular user there was some confusion over why it was generating executables that wouldn't run on machines without AVX support.. So I thought I'd copy the detail here to serve as an explanation / reminder to anyone who may run into this:

Ok.. so to give you some background here as to what is going on.. UASM/ASMC/JWASM all generate a lot of code for invoke/prologue/epilogue on procs etc. Even more now so in UASM with it's more advanced prologue/epilogue and macro library.

Traditionally ALL the instructions were generated as SSE (ASMC and JWASM). The problem however then arises where if you write a procedure that uses AVX/AVX2 or AVX512 there is a massive penalty from transition between SSE and AVX modes. To reduce this penalty you can insert VZEROALL or VZEROUPPER instructions to avoid the state change costing thousands of cycles.

The problem was that under some arrangements with SSE as the default / used in prologue there was no opportunity for the programmer to insert these to avoid the penalty, or in others you might simply forget and have no idea why the code is so slow.

Because of this we totally re-worked ALL that code to work more like a fully-fledged compiler (like VC/GCC etc) which gives you an option.

We have OPTION ARCH:SSE and OPTION ARCH:AVX which control this. Depending on that setting all proc/invoke/prologue/epilogue and macro library built-in functionality will use the corresponding instruction set, so you can switch that back and forth in your code as much as you like depending on your requirements for instruction set.

In your case where you need the code to run on machines with SSE only and no AVX support you should add OPTION ARCH:SSE either to the code or it can be specified on the command line via switch.

OPTION ARCH:AVX was determined to be the best default, but with the command line switch or OPTION directive it's entirely up to you.
If you add that then you should get MOVQ instead of the AVX equivalent VMOVQ.

The command line switches are listed when you use -?
They are:

-archSSE OR –archAVX

You can use the OPTION multiple times in code without restriction, so you could wrap sets of SSE and AVX functions in them to provide different execution paths or library calls etc.

John

johnsa

Just to let you know, we have uploaded an update to 2.39 dated today 4th September.
All that has changed is the default architecture is now SSE instead of AVX to maximise default compatibility.

The OPTION ARCH and command line switches work as before.
So if you wanted to generate AVX opcodes in invoke/prologue/epilogue you'd explicitly enable it with OPTION ARCH:AVX or -archAVX on the command line.

John


habran

#3
 :biggrin:
There is one more thing that is added in that last build and John forgot to include in Extended Manual:
OPTION SWITCHSIZE:SIZE   which we limited to 8000h  and default is 4000h
The purpose is to give to a programmer the choice to choose between speed or size
usage is E.G.:
  OPTION SWITCHSIZE:2000h
the mechanic is in hll.c:

.....
swsize = 0x4000;
.....
          if (ModuleInfo.Ofssize == USE32 || hll->csize == 4) {
            bubblesort(hll, hll->plabels, hll->pcases, hll->casecnt);
            if ((hll->delta * 4) <= (hll->casecnt * 4 + hll->casecnt * 2))
              hll->cflag = 6;                   /* we need only jump table */
            else if (hll->delta < 256)
              hll->cflag = 4;                   /* we need both jump table and count table byte size */
            else if (hll->delta < swsize)       /* size limited to 0x4000  */
              hll->cflag = 7;                   /* we need both jump table and count table word size */
            else
              hll->cflag = 5;                   /* we will use a binary tree */
            }


The Samples folder in both 32 and 64 bit contains switch32.asm which gives example of each version of cases
and explanation what it produces


Cod-Father

nidud

deleted

johnsa

Habran is investigating the switch issue. Update asap.

habran

Thanks for pointing the bug nidud  :t
It was hard to find why was that happening but it is fixed now as well as improved the speed. I believe it will be slightly faster than ASMC with the less data than ASMC
because in that particular case it kicks in option 6, in which only jump table is created without using word counter. It creates less data in this case if we insert default (or exit if not default) : .default = @C0559
@C0001, @C0002,@C0003,@C0004,@C0559,@C0006,@C0559,@C0559,@C0009...

UASM 32 bit even more often find out that it takes less data if used option 6 instead of option 7 because of DWORD jump table.

John will upload soon the UASM-2.40 with that and some more fixes 8)
Cod-Father

nidud

deleted

jj2007


nidud

deleted

jj2007

Quote from: nidud on September 09, 2017, 11:37:13 AMWell, now you see what happens when you slip up and forget to add the standard syntax

Dear nidud,

Really, you shouldn't post here when you are drunk! Just in case you haven't noticed: This is not a C forum, so "standard syntax" here is either exit (the Masm32 macro) or invoke ExitProcess, 0 (Windows API).

Obviously you don't have the faintest idea what spaghetti code really is. MasmBasic definitely doesn't fall into that category.

Btw, I have often praised AsmC for its speed, and I am still using it now and then. But if you keep adding all those crappy "improvements", AsmC will become a confused wannabe assembler, and the worst C compiler developed in the 21st century :icon_mrgreen:

hutch--

 :biggrin:

There is nothing wrong with spaghetti code as long as you use the right sauce. It seems its been easy to forget what an assembler is. At its core it screws user selected mnemonics together to create object modules that with other compatible object modules can be linked together to make executable code. There is good reason why you assist that use of higher level code like .IF, Switch blocks, some loop operators and the like as modern assembler must use external high level code to interact with the operating system but once you shift past that you are writing a compiler.

Instead of political correctness on how you write code with the restrictions of the poorly designed loop code of so many modern compilers, assembler allows you to write from simple one dimension loops to complex variations like multi nested conditional loops, crossfire loops, multitudes of interdependent loops and all you need to do is know how to write them.

With an assembler you are free of strong typing that even C compilers need to bypass from time to time and you will note that you still have "goto" to solve the problems of crappy loop design when you need to exit a location without the assumptions of one dimensional loop methods.

Now the notion of "standard syntax" certainly has its place but only in the context of a formalised standard for a compiler, C89, c99 and whatever comes after it but it has no place with an assembler where you have the power to write anything you know how to do without the code being crippled by junky assumptions. I see nothing wrong with trying to write a C compiler but it is an entirely different animal to an assembler.

A C compiler and a compatible assembler are not competitors but complimentary object module creators, while at least some C code is supplemented with assembler modules, it is equally as easy to use an object module created in C to either test out or use a C algorithm in an assembler application. About all you have to do is have a good look at it as mnemonic code and once you get over the YUK factor of the tangled mess you discover, if there are any gains to be had you tweak it to make it faster.

nidud

#12
deleted

johnsa

Quote from: nidud on September 10, 2017, 12:21:45 AM
Quote from: jj2007 on September 09, 2017, 12:45:52 PM

Quoteand the worst C compiler developed in the 21st century :icon_mrgreen:

Given I have a rather conservative view of the debate regarding the assembler/compiler issue, a more correct statement would probably be the second worst C compiler. Needless to say these emotional outbursts are not very constructive.
What's the first worst C compiler ? :)

nidud

#14
deleted