News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Instruction Set Detection including AVX-512 sub-sets

Started by Gunther, December 26, 2017, 09:37:26 AM

Previous topic - Next topic

Gunther

The attached zip archive is.zip contains the C and assembly language sources. The entire application is compiled with the gcc, because I've no other compiler running here. I hope that Habran can help. I used ANSI C, even for the comments. Compilation with VS should therefore not be a problem.

But the main work is done by assembler routines. That's made with the latest release of UASM, an excellent tool that works without complaint, by the way. The detection up to AVX 2 isn't a big deal. The tricky part is the the impressive AVX-512 part together with the existing and scheduled Intel architectures. The program tests a total of 15 different AVX-512 sets and subsets. AVX-512 F is indicated by feature number 14. All other extensions each have their own flag variable, which can be tested separately. I'm trying to explain the reason for that now. AVX-512 at first glance seems to be a big mess. But let's take a closer look; we have, for example:

  • AVX-512 F: AVX-512 Foundation is the natural extensions to AVX/AVX2 which is extended using the EVEX prefix which builds on the existing VEX prefix. Any processor that implements any portion of the AVX-512 extensions MUST implement AVX-512 F.
  • AVX-512 CD: Conflict Detection instructions offer additional vectorization of loops with possible address conflict.
  • AVX-512 DQ: DWORD and QWORD  instructions add new 32-bit and 64-bit AVX-512 instructions (conversion, transcendental support, etc.).
  • AVX-512 PF: Prefetch instructions add new prefetchs for gather/scatter.
  • AVX-512 ER: Exponential and Reciprocal instructions for various scientific applications.
  • AVX-512 VL: Vector Length instructions add vector length orthogonality, allowing most AVX-512 operations to also operate on XMM and YMM registers.
  • AVX-512 BW: Byte and Word instructions add support for for 8-bit and 16-bit integers.
  • AVX-512 IFMA: Integer Fused Multiply-Add instructions add support for fused multiply add of integers using 52-bit precision.
  • AVX-512 VBMI: Vector Bit Manipulation instructions add additional vector byte permutations.
  • AVX-512 4FMAPS: Fused Multiply-Add Packed Single for deep learning (AI).
  • AVX-512 4VNNIW: Vector Neuronal Networks Instructions Word Variable for deep learning (AI).
  • AVX-512 VPOPCNTDQ: Population Count support for DWORD and QWORD.
For more details, please check Intel's manuals, for example here. There everyone can inform about Galois Field Affine Transformations; these are important in number theory and for cryptology. I'm not saying more about it now.

All in all, a lot is coming up. After all, AMD will have to join in, so as not to lose the technological connection. All these sets and sub-sets are tested and are signaled. A typical output of the program is that:

Supported Features by Processor and Operating System
====================================================

Vendor String: GenuineIntel
Brand  String: Intel(R)Core(TM)i7-7820XCPU@3.60GHz

Instruction Sets
----------------

MMX  SSE  SSE2  SSE3  SSSE3  SSE4.1  SSE4.2  AVX  AVX2  AVX-512 F

It's safe to use the following AVX-512 extensions with this machine:
--------------------------------------------------------------------

AVX-512 DQ      : DWORD and QWORD instructions (conversion, transcendental support etc.)
AVX-512 CD      : Conflict Detection instructions offer additional vectorization of loops with possible
                  address conflicts.
AVX-512 BW      : Byte and Word support for 8- and 16-bit integers.
AVX-512 VL      : Vector Length instructions add vector length orthogonality, allowing most AVX-512 instructions
                  to also operate on XMM and YMM registers.

Any processor that implements any portion of the AVX-512 extensions MUST implement AVX-512 F. Some AVX-512 extensions
are currently only planned by Intel. Not every architecture has all the instruction sets built in:

Knights Landing provides      : CD, ER, and PF.
Skylake provides              : CD, BW, DQ, and VL.
For Cannon Lake are scheduled : CD, BW, DQ, VL, IFMA, and VBMI.
For Icelake are scheduled     : CD, BW, DQ, VL, IFMA, and VBMI.
For Knights Mill are scheduled: CD, ER, PF, 4FMAPS, 4VNNIW, and VPOPCNTDQ.

That's what Intel has released so far.


If AVX-512 is not present on a machine, nothing bad happens. Then only the available instruction sets and the bottom part of the architectures are displayed. By the way, CPU-Z displays only AVX-512 F on the same machine.

A lot of tests with different machines and environments and comments would be fine. Thank you.

Gunther


You have to know the facts before you can distort them.

jj2007

It seems my machine is a bit old...
Brand  String: Intel(R)Core(TM)i5-2450MCPU@2.50GHz

        Instruction Sets
        ----------------

MMX  SSE  SSE2  SSE3  SSSE3  SSE4.1  SSE4.2  AVX

It's safe to use the following AVX-512 extensions with this machine:
--------------------------------------------------------------------


Any processor that implements any portion of the AVX-512 extensions

Gunther

Jochen,

thank you for testing the application.

Quote from: jj2007 on December 26, 2017, 10:29:36 AM
It seems my machine is a bit old...

But you've AVX which was a big step forward. I should change the logic in the main procedure. The AVX-512 part should only be displayed if the instruction set really exists. That's probably better to avoid irritations.

Gunther
You have to know the facts before you can distort them.

Gunther

I've changed the logic of the main procedure a bit. The AVX-512 part is only be displayed if the instruction set really exists. That's probably better to avoid irritations.

Gunther
You have to know the facts before you can distort them.

Biterider


habran

Hi Gunther :biggrin:
here is what I get:
Quote
Microsoft Windows [Version 6.3.9600]
(c) 2013 Microsoft Corporation. All rights reserved.

D:\Downloads\is>is
        Supported Features by Processor and Operating System
        ====================================================

Vendor String: GenuineIntel
Brand  String: Intel(R)Core(TM)i7-4700MQCPU@2.40GHz

        Instruction Sets
        ----------------

MMX  SSE  SSE2  SSE3  SSSE3  SSE4.1  SSE4.2  AVX  AVX2


D:\Downloads\is>
It is nicely done and it does the job, however, I would suggest you to take some time to write all in asm using hll
to make it all easy readable and to show capabilities of UASM and we can use that project in Samples project.
There is no rush, take your time, it will also give you chance to get more familiar with UASM capabilities.

regards
Cod-Father

mabdelouahab

Quote
        Supported Features by Processor and Operating System
        ====================================================

Vendor String: GenuineIntel
Brand  String: Intel(R)Core(TM)i5-4210UCPU@1.70GHz

        Instruction Sets
        ----------------

MMX  SSE  SSE2  SSE3  SSSE3  SSE4.1  SSE4.2  AVX  AVX2



Gunther

Thanks to mabdelouahab and Biterider for testing.  :t

Habran,

Quote from: habran on December 26, 2017, 06:37:00 PM
It is nicely done and it does the job, however, I would suggest you to take some time to write all in asm using hll ...

Are you sure that it justifies the effort? You know that I'm not such a big friend of macros. But we will see; there is a lot of work at the moment.

Gunther
You have to know the facts before you can distort them.

nidud

deleted

Gunther

Hi nidud,

thank you for your answer; very interesting, indeed. I think I should take a closer look at ASMC. It could be of great use to me soon. I refer to this post inside of this thread.

Branislaw (aka Habran), Johnsa and you have done a great deal, hats off. That's exactly Hutch's idea: bring the assembler back to the desktop. Especially young, inexperienced high-level language programmers will benefit from your work. The other side of the coin is: There are cases where these comfortable high-level language elements are even harmful. But maybe we can discuss that in a PM.

Unfortunately, your program crashes. It looks like there is something wrong on the stack.

Gunther
You have to know the facts before you can distort them.

habran

Thanks for the flowers Gunther :biggrin:
QuoteThere are cases where these comfortable high-level language elements are even harmful.
I wouldn't agree with you here ;)
Assembler programmers must always look the disassembly to see what the assembler produced and if not happy optimise it. HLL is helping users, as well as programmers, to easy understand what are codes for. Proper comments can support it as well.
We have taken care to optimise HLL built in UASM and in most cases they can maybe produce better and safer code than doing it manually.
However, everyone has different opinion and different needs, so it is pretty hard to satisfy all.

John and myself have been working 28 hours a day on UASM and we are both perfectionists, so, the results must show sooner or later 8)
Our purpose is to please ourselves in the first place, so, anyone who find our baby useful, welcome to the club :biggrin:
Cod-Father

jj2007

Quote from: nidud on December 27, 2017, 04:29:27 AMThe whole point with extending the functionality of the HLL section of the assembler is removal of macros.

And that is precisely the risk of this approach: macros can be changed by the coder any time, built-in stuff can't, you are stuck with it. The more features you pack into the assembler itself, the more things can go utterly wrong. Why is C++ such a desaster? Because they had to squeeze in twenty different and sophisticated methods ("classes"?) to peel a banana :(

Gunther

Branislaw,

Quote from: habran on December 27, 2017, 08:55:06 AM
I wouldn't agree with you here ;)
Assembler programmers must always look the disassembly to see what the assembler produced and if not happy optimise it. HLL is helping users, as well as programmers, to easy understand what are codes for. Proper comments can support it as well.
We have taken care to optimise HLL built in UASM and in most cases they can maybe produce better and safer code than doing it manually.

No offense, please. John and you are hardworking people and have done a great deal. In principle, you are right; in most cases that's true. But there are some rare cases where you need absolute control over every bit in the machine. So, only manual work helps. I think we agree with that. That was my point. Please have a look into this post, and you'll know what I mean. When I change the frame in the midst of an important project, I want to be sure that I will not replace the old overhead with new overhead. These are my concerns and they are justified, right? Every time I look at the assembly of C ++, I throw my hair. Granted, due to the algorithm, the code is pretty nested and twisted. But I have not seen any such botch that the compiler produces for it, even at the highest level of optimization. I think you do not want to trade with me right now. But somehow we will swing the child already.

Quote from: jj2007 on December 27, 2017, 09:43:25 AM
The more features you pack into the assembler itself, the more things can go utterly wrong. Why is C++ such a desaster? Because they had to squeeze in twenty different and sophisticated methods ("classes"?) to peel a banana :(

Jochen hit the nail on the head. He seems to have had bad experiences with C++, that's life.

Gunther
You have to know the facts before you can distort them.

hutch--

I have this problem with the direction that the new Watcom forks are going in and that is trying to do too many things and trying to hold the hot little hand of the programmer instead of pointing them at the hard and complex stuff. At its most basic an assembler is a crude gadget to screw mnemonics together in the right order so it will run on a compatible processor and perform the task it was designed to do. I can see all sorts of good reasons to add familiar capacities for hacking through much of the high level code you need to produce in conjunction with normal mnemonic code, things like .IF, .SWITCH, procedure entry and exit, call automation (INVOKE and similar) and these can be appended on without compromising the crude basics of what an assembler is.

As soon as you go in the direction of complex error checking and the like you are starting to write a high level compiler and in fact this is how languages like early C started. "Just like assembler but easier". The more you hide from the programmer, the less useful the tool is and this would be very unfortunate as I know that a massive amount of work has been done to get these tools up and going. Complex capacity is something that should be done by programmers, not assembler designers in the assembler, if the basic accessories are done properly then a decent set of libraries adds the higher level capacities needed to make the tool address a much wider market. The assumption that the assembler's own internal high level code is better than what the assembler programmer writes is a dangerous one in that while it may be true in some instances, there are enough decent assembler programmers around who can make such assumptions false.

My comments break down to these,

1. Make sure the assembler IS an assembler, not a pseudo compiler.
2. Use modular design so that additional capacities are optional choices for the programmer.
3. Produce a decent PHUKING library or in fact many libraries, that is what made old C the major professional language for so many years.

Gunther

Quote from: hutch-- on December 27, 2017, 11:08:49 AM
My comments break down to these,

1. Make sure the assembler IS an assembler, not a pseudo compiler.
2. Use modular design so that additional capacities are optional choices for the programmer.
3. Produce a decent PHUKING library or in fact many libraries, that is what made old C the major professional language for so many years.

Amen to that, it's the way to go.  :t

Gunther
You have to know the facts before you can distort them.