Author Topic: ZMM instructions  (Read 181 times)

AW

  • Member
  • *****
  • Posts: 2387
  • Let's Make ASM Great Again!
ZMM instructions
« on: November 05, 2019, 06:23:13 PM »
I don't think these instructions with indirect addressing with displacement (and others, like movdqu8) involving zmm registers are correctly translated:

zmmproc proc dest:ptr, src:ptr, _size:size_t
   mov rax,rcx
   vmovups zmm0, [rdx]
   vmovups zmm1, zmmword ptr [rdx+r8-64]
        vmovups [rax],zmm0
        vmovups [rax+r8-64],zmm1
   ret
zmmproc endp


00007ff700e016b0  push         rbp 
00007ff700e016b1  mov         rbp, rsp 
00007ff700e016b4  sub         rsp, 0x20 
00007ff700e016b8  mov         rax, rcx 
00007ff700e016bb  vmovups         zmm0, k0, zmmword ptr [rdx] 
00007ff700e016c1  vmovups         zmm1, k0, zmmword ptr [r8+r10-0x40]  <------------
00007ff700e016c9  vmovups         zmmword ptr [rax], k0, zmm0 
00007ff700e016cf  vmovups         zmmword ptr [r8+r8-0x40], k0, zmm1  <--------------
00007ff700e016d7  leave 
00007ff700e016d8  ret

nidud

  • Member
  • *****
  • Posts: 1762
    • https://github.com/nidud/asmc
Re: ZMM instructions
« Reply #1 on: November 06, 2019, 03:59:04 AM »
Confirmed.

There's an auto detect feature included for [e]vex encoding based on upcode and register used and SIMD registers above 7 is flagged when used for this purpose.

    vmovups zmm7,[rdx+r8-64]
    vmovups zmm8,[rdx+r8-64]
...
   0:   62 91 7c 48 10 7c 10    vmovups zmm7,ZMMWORD PTR [r8+r10*1-0x40]
   7:   ff
   8:   62 51 7c 48 10 44 10    vmovups zmm8,ZMMWORD PTR [r8+rdx*1-0x40]
   f:   ff

The regression test only use numbers above 7 so this was not picked up in the test.

AW

  • Member
  • *****
  • Posts: 2387
  • Let's Make ASM Great Again!
Re: ZMM instructions
« Reply #2 on: November 06, 2019, 04:09:10 PM »
Thanks.
I guess this will take a long time to fix given the huge amount of instructions you will have to review.

nidud

  • Member
  • *****
  • Posts: 1762
    • https://github.com/nidud/asmc
Re: ZMM instructions
« Reply #3 on: November 08, 2019, 02:00:40 AM »
I made an update on the indirect addressing issue. The base and index registers was flipped so this should work now.

   0:   62 b1 7c 48 10 44 02    vmovups zmm0,ZMMWORD PTR [rdx+r8*1-0x40]
   7:   ff


AW

  • Member
  • *****
  • Posts: 2387
  • Let's Make ASM Great Again!
Re: ZMM instructions
« Reply #4 on: November 08, 2019, 04:08:07 AM »
Code: [Select]
zmmproc proc dest:ptr, src:ptr, _size:size_t

mov rax,rcx
vmovups zmm0, [rdx]
vmovups zmm1, zmmword ptr [rdx+r8-64]
    vmovups [rax],zmm0
    vmovups [rax+r8-64],zmm1

ret
zmmproc endp


00007ff7f82d1730 push rbp 
00007ff7f82d1731  mov rbp, rsp 
00007ff7f82d1734  sub rsp, 0x20 
00007ff7f82d1738  mov rax, rcx 
00007ff7f82d173b  vmovups zmm0, k0, zmmword ptr [rdx] 
00007ff7f82d1741  vmovups zmm1, k0, zmmword ptr [rdx+r8-0x40] 
00007ff7f82d1749  vmovups zmmword ptr [rax], k0, zmm0 
00007ff7f82d174f  vmovups zmmword ptr [rax+r8-0x40], k0, zmm1 
00007ff7f82d1757  leave 
00007ff7f82d1758  ret 

 :thumbsup:

AW

  • Member
  • *****
  • Posts: 2387
  • Let's Make ASM Great Again!
Re: ZMM instructions
« Reply #5 on: November 09, 2019, 01:50:51 AM »
2 ZMM problems:
1) ZMM register names change. ZMM8 to ZMM15 are called ZMM24 to ZMM31 in all places I have seen.
2) size of ZMMWORD in stack variables is considered 0x20 bytes not 0x40 bytes

Code: [Select]
zmm_fast proc src : ptr, dest:ptr, _size:qword
LOCAL zmmsave6 : ZMMWORD
LOCAL zmmsave7 : ZMMWORD
LOCAL zmmsave8 : ZMMWORD
LOCAL zmmsave9 : ZMMWORD
LOCAL zmmsave10 : ZMMWORD
LOCAL zmmsave11 : ZMMWORD
LOCAL zmmsave12 : ZMMWORD
LOCAL zmmsave13 : ZMMWORD
LOCAL zmmsave14 : ZMMWORD
LOCAL zmmsave15 : ZMMWORD
LOCAL zmmsave16 : ZMMWORD
LOCAL zmmsave17 : ZMMWORD
LOCAL zmmsave18 : ZMMWORD
LOCAL zmmsave19 : ZMMWORD
LOCAL zmmsave20 : ZMMWORD
LOCAL zmmsave21 : ZMMWORD
LOCAL zmmsave22 : ZMMWORD
LOCAL zmmsave23 : ZMMWORD
LOCAL zmmsave24 : ZMMWORD
LOCAL zmmsave25 : ZMMWORD
LOCAL zmmsave26 : ZMMWORD
LOCAL zmmsave27 : ZMMWORD
LOCAL zmmsave28 : ZMMWORD
LOCAL zmmsave29 : ZMMWORD
LOCAL zmmsave30 : ZMMWORD
LOCAL zmmsave31 : ZMMWORD

vmovups zmmsave6, zmm6
vmovups zmmsave7, zmm7
vmovups zmmsave8, zmm8
vmovups zmmsave9, zmm9
vmovups zmmsave10, zmm10
vmovups zmmsave11, zmm11
vmovups zmmsave12, zmm12
vmovups zmmsave13, zmm13
vmovups zmmsave14, zmm14
vmovups zmmsave15, zmm15
vmovups zmmsave16, zmm16
vmovups zmmsave17, zmm17
vmovups zmmsave18, zmm18
vmovups zmmsave19, zmm19
vmovups zmmsave20, zmm20
vmovups zmmsave21, zmm21
vmovups zmmsave22, zmm22
vmovups zmmsave23, zmm23
vmovups zmmsave24, zmm24
vmovups zmmsave25, zmm25
vmovups zmmsave26, zmm26
vmovups zmmsave27, zmm27
vmovups zmmsave28, zmm28
vmovups zmmsave29, zmm29
vmovups zmmsave30, zmm30
vmovups zmmsave31, zmm31

Translated to:
Code: [Select]
00007ff691df17b0  push rbp 
00007ff691df17b1  mov rbp, rsp 
00007ff691df17b4  sub rsp, 0x360 
00007ff691df17bb  vmovups zmmword ptr [rbp-0x20], k0, zmm6 
00007ff691df17c5  vmovups zmmword ptr [rbp-0x40], k0, zmm7 
00007ff691df17cc  vmovups zmmword ptr [rbp-0x60], k0, zmm24 
00007ff691df17d6  vmovups zmmword ptr [rbp-0x80], k0, zmm25 
00007ff691df17dd  vmovups zmmword ptr [rbp-0xa0], k0, zmm26 
00007ff691df17e7  vmovups zmmword ptr [rbp-0xc0], k0, zmm27 
00007ff691df17ee  vmovups zmmword ptr [rbp-0xe0], k0, zmm28 
00007ff691df17f8  vmovups zmmword ptr [rbp-0x100], k0, zmm29 
00007ff691df17ff  vmovups zmmword ptr [rbp-0x120], k0, zmm30 
00007ff691df1809  vmovups zmmword ptr [rbp-0x140], k0, zmm31 
00007ff691df1810  vmovups zmmword ptr [rbp-0x160], k0, zmm16 
00007ff691df181a  vmovups zmmword ptr [rbp-0x180], k0, zmm17 
00007ff691df1821  vmovups zmmword ptr [rbp-0x1a0], k0, zmm18 
00007ff691df182b  vmovups zmmword ptr [rbp-0x1c0], k0, zmm19 
00007ff691df1832  vmovups zmmword ptr [rbp-0x1e0], k0, zmm20 
00007ff691df183c  vmovups zmmword ptr [rbp-0x200], k0, zmm21 
00007ff691df1843  vmovups zmmword ptr [rbp-0x220], k0, zmm22 
00007ff691df184d  vmovups zmmword ptr [rbp-0x240], k0, zmm23 
00007ff691df1854  vmovups zmmword ptr [rbp-0x260], k0, zmm24 
00007ff691df185e  vmovups zmmword ptr [rbp-0x280], k0, zmm25 
00007ff691df1865  vmovups zmmword ptr [rbp-0x2a0], k0, zmm26 
00007ff691df186f  vmovups zmmword ptr [rbp-0x2c0], k0, zmm27 
00007ff691df1876  vmovups zmmword ptr [rbp-0x2e0], k0, zmm28 
00007ff691df1880  vmovups zmmword ptr [rbp-0x300], k0, zmm29 
00007ff691df1887  vmovups zmmword ptr [rbp-0x320], k0, zmm30 
00007ff691df1891  vmovups zmmword ptr [rbp-0x340], k0, zmm31 

Same problem on exit:

Code: [Select]
@exit:
vmovups zmm6, zmmsave6
vmovups zmm7, zmmsave7
vmovups zmm8, zmmsave8
vmovups zmm9, zmmsave9
vmovups zmm10, zmmsave10
vmovups zmm11, zmmsave11
vmovups zmm12, zmmsave12
vmovups zmm13, zmmsave13
vmovups zmm14, zmmsave14
vmovups zmm15, zmmsave15
vmovups zmm16, zmmsave16
vmovups zmm17, zmmsave17
vmovups zmm18, zmmsave18
vmovups zmm19, zmmsave19
vmovups zmm20, zmmsave20
vmovups zmm21, zmmsave21
vmovups zmm22, zmmsave22
vmovups zmm23, zmmsave23
vmovups zmm24, zmmsave24
vmovups zmm25, zmmsave25
vmovups zmm26, zmmsave26
vmovups zmm27, zmmsave27
vmovups zmm28, zmmsave28
vmovups zmm29, zmmsave29
vmovups zmm30, zmmsave30
vmovups zmm31, zmmsave31

Translated to:

Code: [Select]
00007ff691df22b5  vmovups zmm6, k0, zmmword ptr [rbp-0x20] 
00007ff691df22bf  vmovups zmm7, k0, zmmword ptr [rbp-0x40] 
00007ff691df22c6  vmovups zmm24, k0, zmmword ptr [rbp-0x60] 
00007ff691df22d0  vmovups zmm25, k0, zmmword ptr [rbp-0x80] 
00007ff691df22d7  vmovups zmm26, k0, zmmword ptr [rbp-0xa0] 
00007ff691df22e1  vmovups zmm27, k0, zmmword ptr [rbp-0xc0] 
00007ff691df22e8  vmovups zmm28, k0, zmmword ptr [rbp-0xe0] 
00007ff691df22f2  vmovups zmm29, k0, zmmword ptr [rbp-0x100] 
00007ff691df22f9  vmovups zmm30, k0, zmmword ptr [rbp-0x120] 
00007ff691df2303  vmovups zmm31, k0, zmmword ptr [rbp-0x140] 
00007ff691df230a  vmovups zmm16, k0, zmmword ptr [rbp-0x160] 
00007ff691df2314  vmovups zmm17, k0, zmmword ptr [rbp-0x180] 
00007ff691df231b  vmovups zmm18, k0, zmmword ptr [rbp-0x1a0] 
00007ff691df2325  vmovups zmm19, k0, zmmword ptr [rbp-0x1c0] 
00007ff691df232c  vmovups zmm20, k0, zmmword ptr [rbp-0x1e0] 
00007ff691df2336  vmovups zmm21, k0, zmmword ptr [rbp-0x200] 
00007ff691df233d  vmovups zmm22, k0, zmmword ptr [rbp-0x220] 
00007ff691df2347  vmovups zmm23, k0, zmmword ptr [rbp-0x240] 
00007ff691df234e  vmovups zmm24, k0, zmmword ptr [rbp-0x260] 
00007ff691df2358  vmovups zmm25, k0, zmmword ptr [rbp-0x280] 
00007ff691df235f  vmovups zmm26, k0, zmmword ptr [rbp-0x2a0] 
00007ff691df2369  vmovups zmm27, k0, zmmword ptr [rbp-0x2c0] 
00007ff691df2370  vmovups zmm28, k0, zmmword ptr [rbp-0x2e0] 
00007ff691df237a  vmovups zmm29, k0, zmmword ptr [rbp-0x300] 
00007ff691df2381  vmovups zmm30, k0, zmmword ptr [rbp-0x320] 
00007ff691df238b  vmovups zmm31, k0, zmmword ptr [rbp-0x340] 

LiaoMi

  • Member
  • ****
  • Posts: 590
Re: ZMM instructions
« Reply #6 on: November 09, 2019, 06:10:23 AM »
Hi AW,

cool test, I think that the instruction generator also needs to process the variant with local variables, I have already implemented zmm1{k1}{z} generation, but have not yet tested for logical errors. Here is an ml64 example without a mask {k1}{z} .. with a full set of zmm registers ..

I will add to the project, the conversion of numbers, the variation of addition and subtraction, local variables. Thank you!

P.S> I found an error in the first operand as a memory, so there is no example so far.

nidud

  • Member
  • *****
  • Posts: 1762
    • https://github.com/nidud/asmc
Re: ZMM instructions
« Reply #7 on: November 09, 2019, 07:59:52 PM »
New update.

There's probably still a few other things that needs adjustment but the registers and stack size seems to work now.

  local zmm[32]:zword

    for q,<0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,
           18,19,20,21,22,23,24,25,26,27,28,29,30,31>
        vmovups zmm[q*zword],zmm&q&
        endm

Code: [Select]
   0: 55                    push   rbp
   1: 48 8b ec              mov    rbp,rsp
   4: 48 81 ec 00 08 00 00 sub    rsp,0x800
   b: 62 f1 7c 48 11 45 e0 vmovups ZMMWORD PTR [rbp-0x800],zmm0
  12: 62 f1 7c 48 11 4d e1 vmovups ZMMWORD PTR [rbp-0x7c0],zmm1
  19: 62 f1 7c 48 11 55 e2 vmovups ZMMWORD PTR [rbp-0x780],zmm2
  20: 62 f1 7c 48 11 5d e3 vmovups ZMMWORD PTR [rbp-0x740],zmm3
  27: 62 f1 7c 48 11 65 e4 vmovups ZMMWORD PTR [rbp-0x700],zmm4
  2e: 62 f1 7c 48 11 6d e5 vmovups ZMMWORD PTR [rbp-0x6c0],zmm5
  35: 62 f1 7c 48 11 75 e6 vmovups ZMMWORD PTR [rbp-0x680],zmm6
  3c: 62 f1 7c 48 11 7d e7 vmovups ZMMWORD PTR [rbp-0x640],zmm7
  43: 62 71 7c 48 11 45 e8 vmovups ZMMWORD PTR [rbp-0x600],zmm8
  4a: 62 71 7c 48 11 4d e9 vmovups ZMMWORD PTR [rbp-0x5c0],zmm9
  51: 62 71 7c 48 11 55 ea vmovups ZMMWORD PTR [rbp-0x580],zmm10
  58: 62 71 7c 48 11 5d eb vmovups ZMMWORD PTR [rbp-0x540],zmm11
  5f: 62 71 7c 48 11 65 ec vmovups ZMMWORD PTR [rbp-0x500],zmm12
  66: 62 71 7c 48 11 6d ed vmovups ZMMWORD PTR [rbp-0x4c0],zmm13
  6d: 62 71 7c 48 11 75 ee vmovups ZMMWORD PTR [rbp-0x480],zmm14
  74: 62 71 7c 48 11 7d ef vmovups ZMMWORD PTR [rbp-0x440],zmm15
  7b: 62 e1 7c 48 11 45 f0 vmovups ZMMWORD PTR [rbp-0x400],zmm16
  82: 62 e1 7c 48 11 4d f1 vmovups ZMMWORD PTR [rbp-0x3c0],zmm17
  89: 62 e1 7c 48 11 55 f2 vmovups ZMMWORD PTR [rbp-0x380],zmm18
  90: 62 e1 7c 48 11 5d f3 vmovups ZMMWORD PTR [rbp-0x340],zmm19
  97: 62 e1 7c 48 11 65 f4 vmovups ZMMWORD PTR [rbp-0x300],zmm20
  9e: 62 e1 7c 48 11 6d f5 vmovups ZMMWORD PTR [rbp-0x2c0],zmm21
  a5: 62 e1 7c 48 11 75 f6 vmovups ZMMWORD PTR [rbp-0x280],zmm22
  ac: 62 e1 7c 48 11 7d f7 vmovups ZMMWORD PTR [rbp-0x240],zmm23
  b3: 62 61 7c 48 11 45 f8 vmovups ZMMWORD PTR [rbp-0x200],zmm24
  ba: 62 61 7c 48 11 4d f9 vmovups ZMMWORD PTR [rbp-0x1c0],zmm25
  c1: 62 61 7c 48 11 55 fa vmovups ZMMWORD PTR [rbp-0x180],zmm26
  c8: 62 61 7c 48 11 5d fb vmovups ZMMWORD PTR [rbp-0x140],zmm27
  cf: 62 61 7c 48 11 65 fc vmovups ZMMWORD PTR [rbp-0x100],zmm28
  d6: 62 61 7c 48 11 6d fd vmovups ZMMWORD PTR [rbp-0xc0],zmm29
  dd: 62 61 7c 48 11 75 fe vmovups ZMMWORD PTR [rbp-0x80],zmm30
  e4: 62 61 7c 48 11 7d ff vmovups ZMMWORD PTR [rbp-0x40],zmm31
  eb: c9                    leave 
  ec: c3                    ret   

AW

  • Member
  • *****
  • Posts: 2387
  • Let's Make ASM Great Again!
Re: ZMM instructions
« Reply #8 on: November 10, 2019, 12:46:54 AM »
hi Nidud,
Thanks, it works fine (so far).  :thumbsup:

Hi LiaoMi,
This should be very useful but is a major time consuming (ad)venture.