The archive IsetsAVX-512.zip under the first post of this thread contains in \IsetsAVX-512\Isets\Source the following files:
build_isets.bat: The batch file for building the running EXE
is.asm: Assembly language procedures
ISets.c: C file with main()
ISets.h: C header with some variable declarations etc.
ISetsFunc.c: C file with functions
The binary folder contains the appropriate binary files. The C files are compiled or pre-compiled with gcc version 7.2.0; but there's nothing special; it should compile with any other C compiler (clang, VS, PellesC etc.) But this must be checked. Have a look into the batch file for appropriate switches, please. The assembly language source is written for NASM/YASM, because I've no other assembler running on my new machine - at the moment, at least. I hope that in the next days I'll get the MASM64 package running. With some minor changes the ml64 should also do the job, I hope.
In the last few days a had a very hard fight with the AVX-512 Instruction Set; it's a strange beast. I've checked the latest Intel manuals and the search engine was running at full speed. The manuals are a bit tortuous and in some cases not accurate. On the other hand, one can find a lot of AVX-512 detection code written with that crippled intrinsics. The point is: In most cases the Intel compiler is used which has a built-in detection wrapper; the gcc has another set of intrinsics and VS has a different intrinsic set, too. It's a shame. Furthermore, the intrinsic code isn't good readable. The whole weekend was spent for this work. All things considered, I'm doing just fine.
What's the result? There are several sets and subsets of AVX-512 instructions. You have to be good at set theory to draw the corresponding Venn diagram. Here's what I figured out:
- AVX-512 F is the fundamental instruction set, it expands most of AVX functions to support 512-bit registers and adds masking, embedded broadcasting, embedded rounding and exception control.
- AVX-512 CD is the Conflict Detection instruction set, which allows vectorization of loops with vector dependency due to writing conflicts.
- AVX-512 BW is the Byte and Word support instruction set: 8-bit and 16-bit integer operations, processing up to 64 8-bit elements or 32 16-bit integer elements per vector.
- AVX-512 DQ Double and Quad word instruction set, supports new instructions for double-word (32-bit) and quadword (64-bit) integer and floating-point elements.
- AVX-512 VL Vector Length extensions. Support for vector lengths smaller than 512 bits.
- AVX-512 PF Data prefetching for gather and scatter instructions.
- AVX-512 ER Exponential and Reciprocal instruction set for high-accuracy base-2 exponential functions, reciprocals, and reciprocal square roots.
All together: A lot of work for assembler and compiler builders. The Knights Landing architecture uses AVX-512 F, CD, PF, ER while the Skylake architecture uses AVX-512 F, CD, BW, DQ, ER. And: It seems to me that Intel plans further instruction set extensions for the future. Here is the output of me instruction set detection:
Supported Features by Processor and Operating System
====================================================
Vendor String: GenuineIntel
Brand String: Intel(R)Core(TM)i7-7820XCPU@3.60GHz
Instruction Sets
----------------
MMX SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVX2
AVX-512 F - Fundamental Instructions
AVX-512 DQ - Double and Quad Word Instructions
AVX-512 CD - Conflict Detection Instructions
AVX-512 BW - Byte and Word Support Instructions
AVX-512 ER - Exponential and Reciprocal Instructions
Please, press enter to end the application ...
So, I don't have another choice and I've to realize the AVX-512 detection in a separate procedure for the future, because the situation is very confusing. But for now it works fine and is tested under Windows 10 with the Skylake architecture. Other processors (AMD etc.) will give another output, of course. Test results and comments for improvement are very welcome. Have fun.
Gunther