News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

VPERMILPS Bug?!

Started by LiaoMi, August 01, 2019, 10:57:02 PM

Previous topic - Next topic

LiaoMi

Hi,

I think I found a bug, I experimented in the direction of Lanczos Interpolation, where I forgot to remove the last part of another command. As a result, the code was assembled with incorrect data (920h+var_920)

Quote
ownRowLanczos32pl proc near
var_920         = dword ptr -920h
...
VPERMILPS ymm0, ymm0, ymm1, 920h+var_920
                ;VPERMILPS ymm1, ymm2, ymm3/m256   
                ;Description - RVM  V/V   AVX   Permute single-precision floating-point values in ymm2 using controls from ymm3/mem and store result in ymm1.
                ;VPERMILPS ymm1, ymm0, ymm1, 920h+var_920

will be assembled into

Quotevpermilps ymm0, ymm1, 0

what seems to me wrong  :rolleyes:, there is no warning that the instruction is not correct, as well as with the third parameter something is wrong(everything is fine here, zero is the correct calculation) ..

aw27

I have seen a few wrong AVX instructions that assemble without error to something completely different. A simple one: vmovaps ymm0, 2222

johnsa

There are two code-gen's in UASM at present, The original one inherited from wasm/jwasm and modified over the years which is pretty awful and is the source of all these issues. I started replacing it in January with a new CodeGenV2. What happens in both of these cases at the moment is that the new CodeGenV2 correctly identifies this as an invalid instruction, however due to the fact that to maintain compatibility and not break the product entirely mid-stream in the event the V2 generator can't find a valid instruction it falls back to the legacy one, which creates the nonsense. Once ALL instructions are migrated to the new generator the old one will be removed completely and these issues will be a thing of the past (+ the new generator is a lot cleaner and faster than the old one).

Basically, I'm not fixing anything in the old code-gen unless it's totally unavoidable.. :)

LiaoMi

Quote from: AW on August 02, 2019, 12:40:09 AM
I have seen a few wrong AVX instructions that assemble without error to something completely different. A simple one: vmovaps ymm0, 2222

Hi AW,

I also decided to check those that were on hand, but I didn't find anything on the first try, it would be great to automate such checks  :icon_idea:

Compiler Fuzzing With Prog-Fuzz Is Turning Up Bugs In GCC, Clang
QuoteVegard Nossum of Oracle has been working on fuzzing different open-source compilers for turning up bugs within these code compiler likes GCC and Clang.

Vegard ended up writing a new compiler fuzzer from scratch making use of AFL instrumentation. This new fuzzer is dubbed simply Prog-Fuzz and is available on GitHub https://github.com/vegard/prog-fuzz.

Over the past few months, he has uncovered more than 100 different GCC compiler bugs while about three dozen of them are fixed so far. Most of these bugs cause the compiler to crash with compiler errors, assertion failures, or segmentation faults. At least 9 new bugs were also uncovered in the LLVM/Clang compiler.

LiaoMi

Quote from: johnsa on August 02, 2019, 01:14:35 AM
There are two code-gen's in UASM at present, The original one inherited from wasm/jwasm and modified over the years which is pretty awful and is the source of all these issues. I started replacing it in January with a new CodeGenV2. What happens in both of these cases at the moment is that the new CodeGenV2 correctly identifies this as an invalid instruction, however due to the fact that to maintain compatibility and not break the product entirely mid-stream in the event the V2 generator can't find a valid instruction it falls back to the legacy one, which creates the nonsense. Once ALL instructions are migrated to the new generator the old one will be removed completely and these issues will be a thing of the past (+ the new generator is a lot cleaner and faster than the old one).

Basically, I'm not fixing anything in the old code-gen unless it's totally unavoidable.. :)

Ah, here's how it is, now it's clear  :biggrin: nice tricky way to refactor code  :thumbsup:

aw27

After assembly next step is to disassembly to confirm all is well. I can live with that, I am not using much AVX.  :biggrin:

johnsa

Hopefully in a few months it will be a non issue.. but re-creating the code gen is a painful process.. I've created a regression test per instruction, so we move one at a time to make sure it's right.. its laborious!
Oh for the good ol days of a small ISA with 50 or so instructions.. not the 700 or whatever we have now !

LiaoMi

Hi johnsa,

is it time consuming to update each instruction? Do you need help?

johnsa

It is tedious, we could definitely use a hand creating per-instruction regression tests.

Creating the actual instruction entries isn't too bad as the format has been designed to match very closely with the instruction manuals.

habran

At this moment we need the test for crc32 with all possible combinations:
    CRC32 r32, r / m8      F2 0F 38 F0 / r
    CRC32 r32, r / m8 *    F2 REX 0F 38 F0 / r
    CRC32 r32, r / m16     F2 0F 38 F1 / r
    CRC32 r32, r / m32     F2 0F 38 F1 / r
    CRC32 r64, r / m8      F2 REX.W 0F 38 F0 / r
    CRC32 r64, r / m64     F2 REX.W 0F 38 F1 / r


"crc32",   2, { R32,      R8      },
"crc32",   2, { R32,      R8H     },
"crc32",   2, { R32,      R8E     },
"crc32",   2, { R32,      R8U     },
"crc32",   2, { R32E,     R8      },
"crc32",   2, { R32E,     R8E     },
"crc32",   2, { R32E,     R8U     },
"crc32",   2, { R64,      R8      },
"crc32",   2, { R64,      R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R64E,     R8      },
"crc32",   2, { R64E,     R8U     },
"crc32",   2, { R64E,     R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R32,      R16     },
"crc32",   2, { R32E,     R16     },
"crc32",   2, { R32,      R32     },
"crc32",   2, { R32E,     R32E    },
"crc32",   2, { R32E,     R32     },
"crc32",   2, { R32,      R32E    },
"crc32",   2, { R64,      R64     },
"crc32",   2, { R64E,     R64E    },
"crc32",   2, { R64,      R64E    },
"crc32",   2, { R64E,     R64     },
"crc32",   2, { R32,      M8      },
"crc32",   2, { R32,      M16     },
"crc32",   2, { R32,      M32     },
"crc32",   2, { R64,      M8      },
"crc32",   2, { R64,      M64     },
Cod-Father

johnsa

I can walk somebody through how we create/run the regression tests if someone wants to have a go at one :)

aw27

I believe LiaoMi may be able to adapt his Haskell project for AVX2 instructions generation for testing. How to integrate it into with a 100% C codebase is a challenge.

habran

I have volunteered to create a testing peace for crc32 :biggrin:
crc32 r8,r9                  ;F2 4D 0F 38 F1 C1                 crc32       r8,r9 
crc32 ecx, cl                ;F2 0F 38 F0 C9                    crc32       ecx,cl 
crc32 ecx, ch                ;F2 0F 38 F0 CD                    crc32       ecx,ch 
crc32 ecx, r10b              ;F2 41 0F 38 F0 CA                 crc32       ecx,r10b 
crc32 ecx, sil               ;F2 40 0F 38 F0 CE                 crc32       ecx,sil 
crc32 r10d, al               ;F2 44 0F 38 F0 D0                 crc32       r10d,al 
crc32 r10d, r10b             ;F2 45 0F 38 F0 D2                 crc32       r10d,r10b 
crc32 r10d, sil              ;F2 44 0F 38 F0 D6                 crc32       r10d,sil 
crc32 rcx, al                ;F2 48 0F 38 F0 C8                 crc32       rcx,al 
crc32 rcx, r10b              ;F2 49 0F 38 F0 CA                 crc32       rcx,r10b 
crc32 rcx, sil               ;F2 48 0F 38 F0 CE                 crc32       rcx,sil 
crc32 r10, al                ;F2 4C 0F 38 F0 D0                 crc32       r10,al 
crc32 r10, sil               ;F2 4C 0F 38 F0 D6                 crc32       r10,sil 
crc32 r10, r10b              ;F2 4D 0F 38 F0 D2                 crc32       r10,r10b 
crc32 ecx, ax                ;66 F2 0F 38 F1 C8                 crc32       ecx,ax 
crc32 r10d, bx               ;66 F2 44 0F 38 F1 D3              crc32       r10d,bx 
crc32 ecx, r10w              ;66 F2 41 0F 38 F1 CA              crc32       ecx,r10w 
crc32 r10d,r10w              ;66 F2 45 0F 38 F1 D2              crc32       r10d,r10w 
crc32 ecx, ecx               ;F2 0F 38 F1 C9                    crc32       ecx,ecx 
crc32 r10d, r11d             ;F2 45 0F 38 F1 D3                 crc32       r10d,r11d 
crc32 r10d, ecx              ;F2 44 0F 38 F1 D1                 crc32       r10d,ecx 
crc32 ecx, r11d              ;F2 41 0F 38 F1 CB                 crc32       ecx,r11d 
crc32 rcx, rcx               ;F2 48 0F 38 F1 C9                 crc32       rcx,rcx 
crc32 r10, r11               ;F2 4D 0F 38 F1 D3                 crc32       r10,r11 
crc32 rcx, r10               ;F2 49 0F 38 F1 CA                 crc32       rcx,r10 
crc32 r10, rcx               ;F2 4C 0F 38 F1 D1                 crc32       r10,rcx 
crc32 ecx, dbVar             ;F2 0F 38 F0 0D 4B 2F 00 00        crc32       ecx,byte ptr [dbVar (0404000h)] 
crc32 ecx, dwVar             ;66 F2 0F 38 F1 0D 42 2F 00 00     crc32       ecx,word ptr [dwVar (0404001h)]
crc32 ecx, ddVar             ;F2 0F 38 F1 0D 3B 2F 00 00        crc32       ecx,dword ptr [ddVar (0404003h)] 
crc32 rcx, qvar              ;F2 48 0F 38 F1 0D 35 2F 00 00     crc32       rcx,qword ptr [qvar (0404007h)]
crc32 rcx, dbVar             ;F2 48 0F 38 F0 0D 24 2F 00 00     crc32       rcx,byte ptr [dbVar (0404000h)]
crc32 r10d, dbVar            ;F2 44 0F 38 F0 15 1A 2F 00 00     crc32       r10d,byte ptr [dbVar (0404000h)] 
crc32 r10d, dwVar            ;66 F2 44 0F 38 F1 15 10 2F 00 00  crc32       r10d,word ptr [dwVar (0404001h)]
crc32 r10d, ddVar            ;F2 44 0F 38 F1 15 08 2F 00 00     crc32       r10d,dword ptr [ddVar (0404003h)] 
crc32 r10,  qvar             ;F2 4C 0F 38 F1 15 02 2F 00 00     crc32       r10,qword ptr [qvar (0404007h)] 
crc32 r10,  dbVar            ;F2 4C 0F 38 F0 15 F1 2E 00 00     crc32       r10,byte ptr [dbVar (0404000h)] 
Cod-Father

johnsa

The approach I take is as follows:

1) Create a plain BIN source file for 32bit and 64bit per instruction. These can be found in the regress/src folder.
2) The instruction must be tested in every possible combination (and this is the hard part), using sil/dil vs. high byte registers, registers whos number is <8, <16, <32.. combinations of those
    Combined with an array of addressing modes for memory operands, once again with various registers from the different number banks.
3) Additionally an error src file is created to specifically test variations of the instruction which should fail.
4) If the instruction support various forms of prefixes these must be included too.

Once that is all done, the regress test can automatically run it and compare the output BIN to a known-good/expected result file. Here is where the second part of the slow-process is:

5) Use UASM to assemble the bin file, take the resulting HEX file and I use Defuse to then manually go through each opcode and validate it. In addition I take the same instructions and assemble them via the Defuse interface to ensure we have selected the correct opcode and encoding (IE: the optimal shorter sequences). This result is then used to verify the expected result file which is stored in regress/exp


LiaoMi

Quote from: AW on August 07, 2019, 03:19:06 AM
I believe LiaoMi may be able to adapt his Haskell project for AVX2 instructions generation for testing. How to integrate it into with a 100% C codebase is a challenge.

Hi AW,

it was this project that I took as a basis =)

Quote from: habran on August 06, 2019, 11:32:52 PM
At this moment we need the test for crc32 with all possible combinations:
    CRC32 r32, r / m8      F2 0F 38 F0 / r
    CRC32 r32, r / m8 *    F2 REX 0F 38 F0 / r
    CRC32 r32, r / m16     F2 0F 38 F1 / r
    CRC32 r32, r / m32     F2 0F 38 F1 / r
    CRC32 r64, r / m8      F2 REX.W 0F 38 F0 / r
    CRC32 r64, r / m64     F2 REX.W 0F 38 F1 / r


"crc32",   2, { R32,      R8      },
"crc32",   2, { R32,      R8H     },
"crc32",   2, { R32,      R8E     },
"crc32",   2, { R32,      R8U     },
"crc32",   2, { R32E,     R8      },
"crc32",   2, { R32E,     R8E     },
"crc32",   2, { R32E,     R8U     },
"crc32",   2, { R64,      R8      },
"crc32",   2, { R64,      R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R64E,     R8      },
"crc32",   2, { R64E,     R8U     },
"crc32",   2, { R64E,     R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R32,      R16     },
"crc32",   2, { R32E,     R16     },
"crc32",   2, { R32,      R32     },
"crc32",   2, { R32E,     R32E    },
"crc32",   2, { R32E,     R32     },
"crc32",   2, { R32,      R32E    },
"crc32",   2, { R64,      R64     },
"crc32",   2, { R64E,     R64E    },
"crc32",   2, { R64,      R64E    },
"crc32",   2, { R64E,     R64     },
"crc32",   2, { R32,      M8      },
"crc32",   2, { R32,      M16     },
"crc32",   2, { R32,      M32     },
"crc32",   2, { R64,      M8      },
"crc32",   2, { R64,      M64     },


Quote from: habran on August 07, 2019, 03:46:36 PM
I have volunteered to create a testing peace for crc32 :biggrin:
crc32 r8,r9                  ;F2 4D 0F 38 F1 C1                 crc32       r8,r9 
crc32 ecx, cl                ;F2 0F 38 F0 C9                    crc32       ecx,cl 
crc32 ecx, ch                ;F2 0F 38 F0 CD                    crc32       ecx,ch 
crc32 ecx, r10b              ;F2 41 0F 38 F0 CA                 crc32       ecx,r10b 
crc32 ecx, sil               ;F2 40 0F 38 F0 CE                 crc32       ecx,sil 
crc32 r10d, al               ;F2 44 0F 38 F0 D0                 crc32       r10d,al 
crc32 r10d, r10b             ;F2 45 0F 38 F0 D2                 crc32       r10d,r10b 
crc32 r10d, sil              ;F2 44 0F 38 F0 D6                 crc32       r10d,sil 
crc32 rcx, al                ;F2 48 0F 38 F0 C8                 crc32       rcx,al 
crc32 rcx, r10b              ;F2 49 0F 38 F0 CA                 crc32       rcx,r10b 
crc32 rcx, sil               ;F2 48 0F 38 F0 CE                 crc32       rcx,sil 
crc32 r10, al                ;F2 4C 0F 38 F0 D0                 crc32       r10,al 
crc32 r10, sil               ;F2 4C 0F 38 F0 D6                 crc32       r10,sil 
crc32 r10, r10b              ;F2 4D 0F 38 F0 D2                 crc32       r10,r10b 
crc32 ecx, ax                ;66 F2 0F 38 F1 C8                 crc32       ecx,ax 
crc32 r10d, bx               ;66 F2 44 0F 38 F1 D3              crc32       r10d,bx 
crc32 ecx, r10w              ;66 F2 41 0F 38 F1 CA              crc32       ecx,r10w 
crc32 r10d,r10w              ;66 F2 45 0F 38 F1 D2              crc32       r10d,r10w 
crc32 ecx, ecx               ;F2 0F 38 F1 C9                    crc32       ecx,ecx 
crc32 r10d, r11d             ;F2 45 0F 38 F1 D3                 crc32       r10d,r11d 
crc32 r10d, ecx              ;F2 44 0F 38 F1 D1                 crc32       r10d,ecx 
crc32 ecx, r11d              ;F2 41 0F 38 F1 CB                 crc32       ecx,r11d 
crc32 rcx, rcx               ;F2 48 0F 38 F1 C9                 crc32       rcx,rcx 
crc32 r10, r11               ;F2 4D 0F 38 F1 D3                 crc32       r10,r11 
crc32 rcx, r10               ;F2 49 0F 38 F1 CA                 crc32       rcx,r10 
crc32 r10, rcx               ;F2 4C 0F 38 F1 D1                 crc32       r10,rcx 
crc32 ecx, dbVar             ;F2 0F 38 F0 0D 4B 2F 00 00        crc32       ecx,byte ptr [dbVar (0404000h)] 
crc32 ecx, dwVar             ;66 F2 0F 38 F1 0D 42 2F 00 00     crc32       ecx,word ptr [dwVar (0404001h)]
crc32 ecx, ddVar             ;F2 0F 38 F1 0D 3B 2F 00 00        crc32       ecx,dword ptr [ddVar (0404003h)] 
crc32 rcx, qvar              ;F2 48 0F 38 F1 0D 35 2F 00 00     crc32       rcx,qword ptr [qvar (0404007h)]
crc32 rcx, dbVar             ;F2 48 0F 38 F0 0D 24 2F 00 00     crc32       rcx,byte ptr [dbVar (0404000h)]
crc32 r10d, dbVar            ;F2 44 0F 38 F0 15 1A 2F 00 00     crc32       r10d,byte ptr [dbVar (0404000h)] 
crc32 r10d, dwVar            ;66 F2 44 0F 38 F1 15 10 2F 00 00  crc32       r10d,word ptr [dwVar (0404001h)]
crc32 r10d, ddVar            ;F2 44 0F 38 F1 15 08 2F 00 00     crc32       r10d,dword ptr [ddVar (0404003h)] 
crc32 r10,  qvar             ;F2 4C 0F 38 F1 15 02 2F 00 00     crc32       r10,qword ptr [qvar (0404007h)] 
crc32 r10,  dbVar            ;F2 4C 0F 38 F0 15 F1 2E 00 00     crc32       r10,byte ptr [dbVar (0404000h)] 


Hi habran,

we knew that asking for help is the best motivation to solve a problem yourself))))) thanks for volunteering!

Quote from: johnsa on August 07, 2019, 06:21:16 PM
The approach I take is as follows:

1) Create a plain BIN source file for 32bit and 64bit per instruction. These can be found in the regress/src folder.
2) The instruction must be tested in every possible combination (and this is the hard part), using sil/dil vs. high byte registers, registers whos number is <8, <16, <32.. combinations of those
    Combined with an array of addressing modes for memory operands, once again with various registers from the different number banks.
3) Additionally an error src file is created to specifically test variations of the instruction which should fail.
4) If the instruction support various forms of prefixes these must be included too.

Once that is all done, the regress test can automatically run it and compare the output BIN to a known-good/expected result file. Here is where the second part of the slow-process is:

5) Use UASM to assemble the bin file, take the resulting HEX file and I use Defuse to then manually go through each opcode and validate it. In addition I take the same instructions and assemble them via the Defuse interface to ensure we have selected the correct opcode and encoding (IE: the optimal shorter sequences). This result is then used to verify the expected result file which is stored in regress/exp

Hi johnsa,

thanks for the description! Now it's clear how the basic structure looks like.