New CPU instructions - Support update - Work around through a macro

Started by LiaoMi, November 28, 2021, 09:00:59 PM

Previous topic - Next topic

LiaoMi

Hi,

how to use new instructions if they are not supported by assembler? I have two options, use macros or use Microsoft's Assembler while creating a separate object file.

Galois Field New Instructions (GFNI)
EVEX-encoded Galois field new instructions:

Instruction   Description
VGF2P8AFFINEINVQB   Galois field affine transformation inverse   Supported, Enabled
VGF2P8AFFINEQB   Galois field affine transformation   Supported, Enabled
VGF2P8MULB   Galois field multiply bytes   Supported, Enabled

IA AVX-512 Neural Network Instructions (AVX512_VNNI)   Supported, Enabled
IA AVX-512 Vector Bit Manipulation Instructions (AVX512_VBMI)   Supported, Enabled
IA AVX-512 Vector Bit Manipulation Instructions 2 (AVX512_VBMI2)   Supported, Enabled

AVX-512 Vector Neural Network Instructions (VNNI) - x86 - https://en.wikichip.org/wiki/x86/avx512_vnni
The major motivation behind the AVX512 VNNI extension is the observation that many tight convolutional neural network loops require the repeated multiplication of two 16-bit values or two 8-bit values and accumulate the result to a 32-bit accumulator. Using the foundation AVX-512, for 16-bit, this is possible using two instructions - VPMADDWD which is used to multiply two 16-bit pairs and add them together followed a VPADDD which adds the accumulate value.

VPMADDWD xmm1, xmm2, xmm3/m128
VPMADDWD ymm1, ymm2, ymm3/m256
VPMADDWD xmm1 {k1}{z}, xmm2, xmm3/m128
VPMADDWD ymm1 {k1}{z}, ymm2, ymm3/m256
VPMADDWD zmm1 {k1}{z}, zmm2, zmm3/m512

Combined Volume Set of Intel® 64 and IA-32 Architectures Software Developer's Manuals (PDF) - https://www.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-sdm-combined-volumes-1-2a-2b-2c-2d-3a-3b-3c-3d-and-4.html

The question is, do we have any universal macros for describing commands that are not available for assembly?

Thanks!

jj2007

No, we don't have any universal macros for that, but if you come up with understandable rules for creating such commands, everything is possible :cool:

raymond

The simplest option would be to write your own macros which would insert the required code, such as what was suggested for the FISTTP instruction in the FPU tutorial when an assembler would not recognize that instruction.
Whenever you assume something, you risk being wrong half the time.
http://www.ray.masmcode.com

daydreamer

check for what evex prefix you need to add to existing instructions to get to newer avx512 and write macro that wraps evex prefix+old mnemonic into the new mnemonic
not sure it works for evex,but it did work for me making SSE2 macros for ml6.14
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

habran

It would be complicated to create macro for those instructions but easy to add them in source code
I am totally taken by my work for very long time, however, johnsa can perhaps have some time to add this to source
Cod-Father

johnsa

Hey Habran,

Haven't seen you around in ages! - Glad to know you're still alive :)

LiaoMi, could you please create this as an issue on the Github issues page so I can track all the things that need doing in one place?

I too have had very little time for anything over the last year, but I will try and get around to adding things soon.

We created a new codegen, CodeGenV2 in the source and were in the progress of migrating instructions from the old one into it.
Additionally all new instructions have been added there only. So anyone can have a look at that file and see the instruction entry format, it's not too hard to add new ones unless Intel go and completely change the encoding. As long as it can still be encoded as regular EVEX.

John

LiaoMi

Hi Raymond,

thanks, I'll try!  :thup:

If these newer instructions are not supported by the assembler, macros could be prepared to hard-code them as above. An example of such a macro would be as follows:
   fcomist MACRO i
      db    0dbh,0f0h+i
   ENDM

which could then be used in the above example code as follows:
fcomist 2
http://www.ray.masmcode.com/tutorial/fpuchap7.htm

A few more examples
xorps           macro   XMMReg1, XMMReg2
                db      0FH, 057H, 0C0H + (XMMReg1 * 8) + XMMReg2
                endm

movntps         macro   GeneralReg, Offset, XMMReg
                db      0FH, 02BH, 040H + (XmmReg * 8) + GeneralReg, Offset
                endm

movaps_load     macro   XMMReg, GeneralReg
                db      0FH, 028H, (XMMReg * 8) + 4, (4 * 8) + GeneralReg
                endm

movaps_store    macro   GeneralReg, XMMReg
                db      0FH, 029H, (XMMReg * 8) + 4, (4 * 8) + GeneralReg
                endm

        xorps   0, 0                            ; zero xmm0 (128 bits)
        movntps rECX, 0,  0                     ; store bytes  0 - 15
        movntps rECX, 16, 0                     ;             16 - 31
        movntps rECX, 32, 0                     ;             32 - 47
        movntps rECX, 48, 0                     ;             48 - 63

        sti                                     ; reenable context switching
        movaps_store rESP, 0                    ; save xmm0
        ;ZeroMem call
        movaps_load  0, rESP                    ; restore xmm


Quote from: daydreamer on November 29, 2021, 07:32:56 AM
check for what evex prefix you need to add to existing instructions to get to newer avx512 and write macro that wraps evex prefix+old mnemonic into the new mnemonic
not sure it works for evex,but it did work for me making SSE2 macros for ml6.14

Hi daydreamer,

thanks, very interesting technique! We have something similar in NASM macros - https://github.com/webmproject/libvpx/blob/master/third_party/x86inc/x86inc.asm
; Macros for converting VEX instructions to equivalent EVEX ones.
%macro EVEX_INSTR 2-3 0 ; vex, evex, prefer_evex
    %macro %1 2-7 fnord, fnord, %1, %2, %3
        %ifidn %3, fnord
            %define %%args %1, %2
        %elifidn %4, fnord
            %define %%args %1, %2, %3
        %else
            %define %%args %1, %2, %3, %4
        %endif
        %assign %%evex_required cpuflag(avx512) & %7
        %ifnum regnumof%1
            %if regnumof%1 >= 16 || sizeof%1 > 32
                %assign %%evex_required 1
            %endif
        %endif
        %ifnum regnumof%2
            %if regnumof%2 >= 16 || sizeof%2 > 32
                %assign %%evex_required 1
            %endif
        %endif
        %ifnum regnumof%3
            %if regnumof%3 >= 16 || sizeof%3 > 32
                %assign %%evex_required 1
            %endif
        %endif
        %if %%evex_required
            %6 %%args
        %else
            %5 %%args ; Prefer VEX over EVEX due to shorter instruction length
        %endif
    %endmacro
%endmacro

EVEX_INSTR vbroadcastf128, vbroadcastf32x4
EVEX_INSTR vbroadcasti128, vbroadcasti32x4
EVEX_INSTR vextractf128,   vextractf32x4
EVEX_INSTR vextracti128,   vextracti32x4
EVEX_INSTR vinsertf128,    vinsertf32x4
EVEX_INSTR vinserti128,    vinserti32x4
EVEX_INSTR vmovdqa,        vmovdqa32
EVEX_INSTR vmovdqu,        vmovdqu32
EVEX_INSTR vpand,          vpandd
EVEX_INSTR vpandn,         vpandnd
EVEX_INSTR vpor,           vpord
EVEX_INSTR vpxor,          vpxord
EVEX_INSTR vrcpps,         vrcp14ps,   1 ; EVEX versions have higher precision
EVEX_INSTR vrcpss,         vrcp14ss,   1
EVEX_INSTR vrsqrtps,       vrsqrt14ps, 1
EVEX_INSTR vrsqrtss,       vrsqrt14ss, 1


Quote from: habran on December 01, 2021, 10:34:34 PM
It would be complicated to create macro for those instructions but easy to add them in source code
I am totally taken by my work for very long time, however, johnsa can perhaps have some time to add this to source

Hi Habran,

I guess there are too many instructions like this, so I didn't want to burden you with this work  :tongue:

Quote from: johnsa on December 03, 2021, 08:11:24 PM
Hey Habran,

Haven't seen you around in ages! - Glad to know you're still alive :)

LiaoMi, could you please create this as an issue on the Github issues page so I can track all the things that need doing in one place?

I too have had very little time for anything over the last year, but I will try and get around to adding things soon.

We created a new codegen, CodeGenV2 in the source and were in the progress of migrating instructions from the old one into it.
Additionally all new instructions have been added there only. So anyone can have a look at that file and see the instruction entry format, it's not too hard to add new ones unless Intel go and completely change the encoding. As long as it can still be encoded as regular EVEX.

John

Hi John,

I will try to create a list of new unavailable instructions and compare with the documentation from Intel, when I finish I will publish the list here and on github! Thanks!

habran

Hi friends :biggrin:
I am to busy for long time already, however, Í'll be back (tornero)
I am very alive and in good condition
Thanks everyone :thumbsup:
Cod-Father