Author Topic: VPERMILPS Bug?!  (Read 2459 times)

LiaoMi

  • Member
  • ****
  • Posts: 595
VPERMILPS Bug?!
« on: August 01, 2019, 10:57:02 PM »
Hi,

I think I found a bug, I experimented in the direction of Lanczos Interpolation, where I forgot to remove the last part of another command. As a result, the code was assembled with incorrect data (920h+var_920)

Quote
ownRowLanczos32pl proc near
var_920         = dword ptr -920h
...
VPERMILPS ymm0, ymm0, ymm1, 920h+var_920
                ;VPERMILPS ymm1, ymm2, ymm3/m256   
                ;Description - RVM  V/V   AVX   Permute single-precision floating-point values in ymm2 using controls from ymm3/mem and store result in ymm1.
                ;VPERMILPS ymm1, ymm0, ymm1, 920h+var_920

will be assembled into

Quote
vpermilps ymm0, ymm1, 0

what seems to me wrong  :rolleyes:, there is no warning that the instruction is not correct, as well as with the third parameter something is wrong(everything is fine here, zero is the correct calculation) ..
« Last Edit: August 02, 2019, 03:08:17 AM by LiaoMi »

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: VPERMILPS Bug?!
« Reply #1 on: August 02, 2019, 12:40:09 AM »
I have seen a few wrong AVX instructions that assemble without error to something completely different. A simple one: vmovaps ymm0, 2222

johnsa

  • Member
  • ****
  • Posts: 791
    • Uasm
Re: VPERMILPS Bug?!
« Reply #2 on: August 02, 2019, 01:14:35 AM »
There are two code-gen's in UASM at present, The original one inherited from wasm/jwasm and modified over the years which is pretty awful and is the source of all these issues. I started replacing it in January with a new CodeGenV2. What happens in both of these cases at the moment is that the new CodeGenV2 correctly identifies this as an invalid instruction, however due to the fact that to maintain compatibility and not break the product entirely mid-stream in the event the V2 generator can't find a valid instruction it falls back to the legacy one, which creates the nonsense. Once ALL instructions are migrated to the new generator the old one will be removed completely and these issues will be a thing of the past (+ the new generator is a lot cleaner and faster than the old one).

Basically, I'm not fixing anything in the old code-gen unless it's totally unavoidable.. :)

LiaoMi

  • Member
  • ****
  • Posts: 595
Re: VPERMILPS Bug?!
« Reply #3 on: August 02, 2019, 02:58:31 AM »
I have seen a few wrong AVX instructions that assemble without error to something completely different. A simple one: vmovaps ymm0, 2222

Hi AW,

I also decided to check those that were on hand, but I didn't find anything on the first try, it would be great to automate such checks  :icon_idea:

Compiler Fuzzing With Prog-Fuzz Is Turning Up Bugs In GCC, Clang
Quote
Vegard Nossum of Oracle has been working on fuzzing different open-source compilers for turning up bugs within these code compiler likes GCC and Clang.

Vegard ended up writing a new compiler fuzzer from scratch making use of AFL instrumentation. This new fuzzer is dubbed simply Prog-Fuzz and is available on GitHub https://github.com/vegard/prog-fuzz.

Over the past few months, he has uncovered more than 100 different GCC compiler bugs while about three dozen of them are fixed so far. Most of these bugs cause the compiler to crash with compiler errors, assertion failures, or segmentation faults. At least 9 new bugs were also uncovered in the LLVM/Clang compiler.

LiaoMi

  • Member
  • ****
  • Posts: 595
Re: VPERMILPS Bug?!
« Reply #4 on: August 02, 2019, 03:03:33 AM »
There are two code-gen's in UASM at present, The original one inherited from wasm/jwasm and modified over the years which is pretty awful and is the source of all these issues. I started replacing it in January with a new CodeGenV2. What happens in both of these cases at the moment is that the new CodeGenV2 correctly identifies this as an invalid instruction, however due to the fact that to maintain compatibility and not break the product entirely mid-stream in the event the V2 generator can't find a valid instruction it falls back to the legacy one, which creates the nonsense. Once ALL instructions are migrated to the new generator the old one will be removed completely and these issues will be a thing of the past (+ the new generator is a lot cleaner and faster than the old one).

Basically, I'm not fixing anything in the old code-gen unless it's totally unavoidable.. :)

Ah, here's how it is, now it’s clear  :biggrin: nice tricky way to refactor code  :thumbsup:

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: VPERMILPS Bug?!
« Reply #5 on: August 02, 2019, 05:59:29 AM »
After assembly next step is to disassembly to confirm all is well. I can live with that, I am not using much AVX.  :biggrin:

johnsa

  • Member
  • ****
  • Posts: 791
    • Uasm
Re: VPERMILPS Bug?!
« Reply #6 on: August 03, 2019, 05:29:29 AM »
Hopefully in a few months it will be a non issue.. but re-creating the code gen is a painful process.. I've created a regression test per instruction, so we move one at a time to make sure it's right.. its laborious!
Oh for the good ol days of a small ISA with 50 or so instructions.. not the 700 or whatever we have now !

LiaoMi

  • Member
  • ****
  • Posts: 595
Re: VPERMILPS Bug?!
« Reply #7 on: August 06, 2019, 10:18:40 PM »
Hi johnsa,

is it time consuming to update each instruction? Do you need help?

johnsa

  • Member
  • ****
  • Posts: 791
    • Uasm
Re: VPERMILPS Bug?!
« Reply #8 on: August 06, 2019, 11:08:41 PM »
It is tedious, we could definitely use a hand creating per-instruction regression tests.

Creating the actual instruction entries isn't too bad as the format has been designed to match very closely with the instruction manuals.

habran

  • Member
  • *****
  • Posts: 1210
    • uasm
Re: VPERMILPS Bug?!
« Reply #9 on: August 06, 2019, 11:32:52 PM »
At this moment we need the test for crc32 with all possible combinations:
    CRC32 r32, r / m8      F2 0F 38 F0 / r
    CRC32 r32, r / m8 *    F2 REX 0F 38 F0 / r
    CRC32 r32, r / m16     F2 0F 38 F1 / r
    CRC32 r32, r / m32     F2 0F 38 F1 / r
    CRC32 r64, r / m8      F2 REX.W 0F 38 F0 / r
    CRC32 r64, r / m64     F2 REX.W 0F 38 F1 / r

Code: [Select]
"crc32",   2, { R32,      R8      },
"crc32",   2, { R32,      R8H     },
"crc32",   2, { R32,      R8E     },
"crc32",   2, { R32,      R8U     },
"crc32",   2, { R32E,     R8      },
"crc32",   2, { R32E,     R8E     },
"crc32",   2, { R32E,     R8U     },
"crc32",   2, { R64,      R8      },
"crc32",   2, { R64,      R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R64E,     R8      },
"crc32",   2, { R64E,     R8U     },
"crc32",   2, { R64E,     R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R32,      R16     },
"crc32",   2, { R32E,     R16     },
"crc32",   2, { R32,      R32     },
"crc32",   2, { R32E,     R32E    },
"crc32",   2, { R32E,     R32     },
"crc32",   2, { R32,      R32E    },
"crc32",   2, { R64,      R64     },
"crc32",   2, { R64E,     R64E    },
"crc32",   2, { R64,      R64E    },
"crc32",   2, { R64E,     R64     },
"crc32",   2, { R32,      M8      },
"crc32",   2, { R32,      M16     },
"crc32",   2, { R32,      M32     },
"crc32",   2, { R64,      M8      },
"crc32",   2, { R64,      M64     },
Cod-Father

johnsa

  • Member
  • ****
  • Posts: 791
    • Uasm
Re: VPERMILPS Bug?!
« Reply #10 on: August 07, 2019, 12:04:21 AM »
I can walk somebody through how we create/run the regression tests if someone wants to have a go at one :)

AW

  • Member
  • *****
  • Posts: 2442
  • Let's Make ASM Great Again!
Re: VPERMILPS Bug?!
« Reply #11 on: August 07, 2019, 03:19:06 AM »
I believe LiaoMi may be able to adapt his Haskell project for AVX2 instructions generation for testing. How to integrate it into with a 100% C codebase is a challenge.

habran

  • Member
  • *****
  • Posts: 1210
    • uasm
Re: VPERMILPS Bug?!
« Reply #12 on: August 07, 2019, 03:46:36 PM »
I have volunteered to create a testing peace for crc32 :biggrin:
Code: [Select]
crc32 r8,r9                  ;F2 4D 0F 38 F1 C1                 crc32       r8,r9 
crc32 ecx, cl                ;F2 0F 38 F0 C9                    crc32       ecx,cl 
crc32 ecx, ch                ;F2 0F 38 F0 CD                    crc32       ecx,ch 
crc32 ecx, r10b              ;F2 41 0F 38 F0 CA                 crc32       ecx,r10b 
crc32 ecx, sil               ;F2 40 0F 38 F0 CE                 crc32       ecx,sil 
crc32 r10d, al               ;F2 44 0F 38 F0 D0                 crc32       r10d,al 
crc32 r10d, r10b             ;F2 45 0F 38 F0 D2                 crc32       r10d,r10b 
crc32 r10d, sil              ;F2 44 0F 38 F0 D6                 crc32       r10d,sil 
crc32 rcx, al                ;F2 48 0F 38 F0 C8                 crc32       rcx,al 
crc32 rcx, r10b              ;F2 49 0F 38 F0 CA                 crc32       rcx,r10b 
crc32 rcx, sil               ;F2 48 0F 38 F0 CE                 crc32       rcx,sil 
crc32 r10, al                ;F2 4C 0F 38 F0 D0                 crc32       r10,al 
crc32 r10, sil               ;F2 4C 0F 38 F0 D6                 crc32       r10,sil 
crc32 r10, r10b              ;F2 4D 0F 38 F0 D2                 crc32       r10,r10b 
crc32 ecx, ax                ;66 F2 0F 38 F1 C8                 crc32       ecx,ax 
crc32 r10d, bx               ;66 F2 44 0F 38 F1 D3              crc32       r10d,bx 
crc32 ecx, r10w              ;66 F2 41 0F 38 F1 CA              crc32       ecx,r10w 
crc32 r10d,r10w              ;66 F2 45 0F 38 F1 D2              crc32       r10d,r10w 
crc32 ecx, ecx               ;F2 0F 38 F1 C9                    crc32       ecx,ecx 
crc32 r10d, r11d             ;F2 45 0F 38 F1 D3                 crc32       r10d,r11d 
crc32 r10d, ecx              ;F2 44 0F 38 F1 D1                 crc32       r10d,ecx 
crc32 ecx, r11d              ;F2 41 0F 38 F1 CB                 crc32       ecx,r11d 
crc32 rcx, rcx               ;F2 48 0F 38 F1 C9                 crc32       rcx,rcx 
crc32 r10, r11               ;F2 4D 0F 38 F1 D3                 crc32       r10,r11 
crc32 rcx, r10               ;F2 49 0F 38 F1 CA                 crc32       rcx,r10 
crc32 r10, rcx               ;F2 4C 0F 38 F1 D1                 crc32       r10,rcx 
crc32 ecx, dbVar             ;F2 0F 38 F0 0D 4B 2F 00 00        crc32       ecx,byte ptr [dbVar (0404000h)] 
crc32 ecx, dwVar             ;66 F2 0F 38 F1 0D 42 2F 00 00     crc32       ecx,word ptr [dwVar (0404001h)]
crc32 ecx, ddVar             ;F2 0F 38 F1 0D 3B 2F 00 00        crc32       ecx,dword ptr [ddVar (0404003h)] 
crc32 rcx, qvar              ;F2 48 0F 38 F1 0D 35 2F 00 00     crc32       rcx,qword ptr [qvar (0404007h)]
crc32 rcx, dbVar             ;F2 48 0F 38 F0 0D 24 2F 00 00     crc32       rcx,byte ptr [dbVar (0404000h)]
crc32 r10d, dbVar            ;F2 44 0F 38 F0 15 1A 2F 00 00     crc32       r10d,byte ptr [dbVar (0404000h)] 
crc32 r10d, dwVar            ;66 F2 44 0F 38 F1 15 10 2F 00 00  crc32       r10d,word ptr [dwVar (0404001h)]
crc32 r10d, ddVar            ;F2 44 0F 38 F1 15 08 2F 00 00     crc32       r10d,dword ptr [ddVar (0404003h)] 
crc32 r10,  qvar             ;F2 4C 0F 38 F1 15 02 2F 00 00     crc32       r10,qword ptr [qvar (0404007h)] 
crc32 r10,  dbVar            ;F2 4C 0F 38 F0 15 F1 2E 00 00     crc32       r10,byte ptr [dbVar (0404000h)] 
Cod-Father

johnsa

  • Member
  • ****
  • Posts: 791
    • Uasm
Re: VPERMILPS Bug?!
« Reply #13 on: August 07, 2019, 06:21:16 PM »
The approach I take is as follows:

1) Create a plain BIN source file for 32bit and 64bit per instruction. These can be found in the regress/src folder.
2) The instruction must be tested in every possible combination (and this is the hard part), using sil/dil vs. high byte registers, registers whos number is <8, <16, <32.. combinations of those
    Combined with an array of addressing modes for memory operands, once again with various registers from the different number banks.
3) Additionally an error src file is created to specifically test variations of the instruction which should fail.
4) If the instruction support various forms of prefixes these must be included too.

Once that is all done, the regress test can automatically run it and compare the output BIN to a known-good/expected result file. Here is where the second part of the slow-process is:

5) Use UASM to assemble the bin file, take the resulting HEX file and I use Defuse to then manually go through each opcode and validate it. In addition I take the same instructions and assemble them via the Defuse interface to ensure we have selected the correct opcode and encoding (IE: the optimal shorter sequences). This result is then used to verify the expected result file which is stored in regress/exp


LiaoMi

  • Member
  • ****
  • Posts: 595
Re: VPERMILPS Bug?!
« Reply #14 on: August 07, 2019, 10:28:03 PM »
I believe LiaoMi may be able to adapt his Haskell project for AVX2 instructions generation for testing. How to integrate it into with a 100% C codebase is a challenge.

Hi AW,

it was this project that I took as a basis =)

At this moment we need the test for crc32 with all possible combinations:
    CRC32 r32, r / m8      F2 0F 38 F0 / r
    CRC32 r32, r / m8 *    F2 REX 0F 38 F0 / r
    CRC32 r32, r / m16     F2 0F 38 F1 / r
    CRC32 r32, r / m32     F2 0F 38 F1 / r
    CRC32 r64, r / m8      F2 REX.W 0F 38 F0 / r
    CRC32 r64, r / m64     F2 REX.W 0F 38 F1 / r

Code: [Select]
"crc32",   2, { R32,      R8      },
"crc32",   2, { R32,      R8H     },
"crc32",   2, { R32,      R8E     },
"crc32",   2, { R32,      R8U     },
"crc32",   2, { R32E,     R8      },
"crc32",   2, { R32E,     R8E     },
"crc32",   2, { R32E,     R8U     },
"crc32",   2, { R64,      R8      },
"crc32",   2, { R64,      R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R64E,     R8      },
"crc32",   2, { R64E,     R8U     },
"crc32",   2, { R64E,     R8E     },
"crc32",   2, { R64,      R8U     },
"crc32",   2, { R32,      R16     },
"crc32",   2, { R32E,     R16     },
"crc32",   2, { R32,      R32     },
"crc32",   2, { R32E,     R32E    },
"crc32",   2, { R32E,     R32     },
"crc32",   2, { R32,      R32E    },
"crc32",   2, { R64,      R64     },
"crc32",   2, { R64E,     R64E    },
"crc32",   2, { R64,      R64E    },
"crc32",   2, { R64E,     R64     },
"crc32",   2, { R32,      M8      },
"crc32",   2, { R32,      M16     },
"crc32",   2, { R32,      M32     },
"crc32",   2, { R64,      M8      },
"crc32",   2, { R64,      M64     },

I have volunteered to create a testing peace for crc32 :biggrin:
Code: [Select]
crc32 r8,r9                  ;F2 4D 0F 38 F1 C1                 crc32       r8,r9 
crc32 ecx, cl                ;F2 0F 38 F0 C9                    crc32       ecx,cl 
crc32 ecx, ch                ;F2 0F 38 F0 CD                    crc32       ecx,ch 
crc32 ecx, r10b              ;F2 41 0F 38 F0 CA                 crc32       ecx,r10b 
crc32 ecx, sil               ;F2 40 0F 38 F0 CE                 crc32       ecx,sil 
crc32 r10d, al               ;F2 44 0F 38 F0 D0                 crc32       r10d,al 
crc32 r10d, r10b             ;F2 45 0F 38 F0 D2                 crc32       r10d,r10b 
crc32 r10d, sil              ;F2 44 0F 38 F0 D6                 crc32       r10d,sil 
crc32 rcx, al                ;F2 48 0F 38 F0 C8                 crc32       rcx,al 
crc32 rcx, r10b              ;F2 49 0F 38 F0 CA                 crc32       rcx,r10b 
crc32 rcx, sil               ;F2 48 0F 38 F0 CE                 crc32       rcx,sil 
crc32 r10, al                ;F2 4C 0F 38 F0 D0                 crc32       r10,al 
crc32 r10, sil               ;F2 4C 0F 38 F0 D6                 crc32       r10,sil 
crc32 r10, r10b              ;F2 4D 0F 38 F0 D2                 crc32       r10,r10b 
crc32 ecx, ax                ;66 F2 0F 38 F1 C8                 crc32       ecx,ax 
crc32 r10d, bx               ;66 F2 44 0F 38 F1 D3              crc32       r10d,bx 
crc32 ecx, r10w              ;66 F2 41 0F 38 F1 CA              crc32       ecx,r10w 
crc32 r10d,r10w              ;66 F2 45 0F 38 F1 D2              crc32       r10d,r10w 
crc32 ecx, ecx               ;F2 0F 38 F1 C9                    crc32       ecx,ecx 
crc32 r10d, r11d             ;F2 45 0F 38 F1 D3                 crc32       r10d,r11d 
crc32 r10d, ecx              ;F2 44 0F 38 F1 D1                 crc32       r10d,ecx 
crc32 ecx, r11d              ;F2 41 0F 38 F1 CB                 crc32       ecx,r11d 
crc32 rcx, rcx               ;F2 48 0F 38 F1 C9                 crc32       rcx,rcx 
crc32 r10, r11               ;F2 4D 0F 38 F1 D3                 crc32       r10,r11 
crc32 rcx, r10               ;F2 49 0F 38 F1 CA                 crc32       rcx,r10 
crc32 r10, rcx               ;F2 4C 0F 38 F1 D1                 crc32       r10,rcx 
crc32 ecx, dbVar             ;F2 0F 38 F0 0D 4B 2F 00 00        crc32       ecx,byte ptr [dbVar (0404000h)] 
crc32 ecx, dwVar             ;66 F2 0F 38 F1 0D 42 2F 00 00     crc32       ecx,word ptr [dwVar (0404001h)]
crc32 ecx, ddVar             ;F2 0F 38 F1 0D 3B 2F 00 00        crc32       ecx,dword ptr [ddVar (0404003h)] 
crc32 rcx, qvar              ;F2 48 0F 38 F1 0D 35 2F 00 00     crc32       rcx,qword ptr [qvar (0404007h)]
crc32 rcx, dbVar             ;F2 48 0F 38 F0 0D 24 2F 00 00     crc32       rcx,byte ptr [dbVar (0404000h)]
crc32 r10d, dbVar            ;F2 44 0F 38 F0 15 1A 2F 00 00     crc32       r10d,byte ptr [dbVar (0404000h)] 
crc32 r10d, dwVar            ;66 F2 44 0F 38 F1 15 10 2F 00 00  crc32       r10d,word ptr [dwVar (0404001h)]
crc32 r10d, ddVar            ;F2 44 0F 38 F1 15 08 2F 00 00     crc32       r10d,dword ptr [ddVar (0404003h)] 
crc32 r10,  qvar             ;F2 4C 0F 38 F1 15 02 2F 00 00     crc32       r10,qword ptr [qvar (0404007h)] 
crc32 r10,  dbVar            ;F2 4C 0F 38 F0 15 F1 2E 00 00     crc32       r10,byte ptr [dbVar (0404000h)] 

Hi habran,

we knew that asking for help is the best motivation to solve a problem yourself))))) thanks for volunteering!

The approach I take is as follows:

1) Create a plain BIN source file for 32bit and 64bit per instruction. These can be found in the regress/src folder.
2) The instruction must be tested in every possible combination (and this is the hard part), using sil/dil vs. high byte registers, registers whos number is <8, <16, <32.. combinations of those
    Combined with an array of addressing modes for memory operands, once again with various registers from the different number banks.
3) Additionally an error src file is created to specifically test variations of the instruction which should fail.
4) If the instruction support various forms of prefixes these must be included too.

Once that is all done, the regress test can automatically run it and compare the output BIN to a known-good/expected result file. Here is where the second part of the slow-process is:

5) Use UASM to assemble the bin file, take the resulting HEX file and I use Defuse to then manually go through each opcode and validate it. In addition I take the same instructions and assemble them via the Defuse interface to ensure we have selected the correct opcode and encoding (IE: the optimal shorter sequences). This result is then used to verify the expected result file which is stored in regress/exp

Hi johnsa,

thanks for the description! Now it’s clear how the basic structure looks like.