News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Preserving YMM registers in JWasm

Started by habran, June 04, 2015, 09:35:48 PM

Previous topic - Next topic

habran

Hello boys  :biggrin:
I've been working on preserving ymm registers in the new version of JWasm
Current version is able to preserve only xmm registers
My approach:
SSE4.2         preserves xmm registers
AVX & AVX2  preserves ymm registers
AVX512        preserves zmm registers
Here is one example which I built with a new version which will be from now on named HJWasm
Quote
testproc PROC FRAME USES rbx ymm6  ymm15  val:QWORD, val2:DWORD
0000000000BA1970 48 89 4C 24 08       mov         qword ptr [rsp+8],rcx 
0000000000BA1975 48 89 54 24 10       mov         qword ptr [rsp+10h],rdx 
0000000000BA197A 53                   push        rbx 
0000000000BA197B 48 81 EC 80 00 00 00 sub         rsp,80h 
0000000000BA1982 C5 FD 7F 74 24 40    vmovdqa     ymmword ptr [rsp+40h],ymm6 
0000000000BA1988 C5 7D 7F 7C 24 60    vmovdqa     ymmword ptr [rsp+60h],ymm15 
   
   LOCAL aVar:QWORD   
   LOCAL bVar:QWORD       
   mov eax,val2     
0000000000BA198E 8B 84 24 98 00 00 00 mov         eax,dword ptr [val2] 
   mov bVar,rax   
0000000000BA1995 48 89 44 24 70       mov         qword ptr [bVar],rax 
   mov val,33
0000000000BA199A 48 C7 84 24 90 00 00 00 21 00 00 00 mov         qword ptr [val],21h 
   invoke printf,TZ("val %d"),val
0000000000BA19A6 48 8B 94 24 90 00 00 00 mov         rdx,qword ptr [val] 
0000000000BA19AE 48 B9 40 54 BA 00 00 00 00 00 mov         rcx,0BA5440h 
0000000000BA19B8 E8 75 12 00 00       call        printf (0BA2C32h) 
        
   mov rax,34           
0000000000BA19BD 48 C7 C0 22 00 00 00 mov         rax,22h 
   mov aVar,rax   
0000000000BA19C4 48 89 44 24 68       mov         qword ptr [aVar],rax 
      
   invoke printf,TZ("aVar %d"),aVar
0000000000BA19C9 48 8B 54 24 68       mov         rdx,qword ptr [aVar] 
0000000000BA19CE 48 B9 47 54 BA 00 00 00 00 00 mov         rcx,0BA5447h 
0000000000BA19D8 E8 55 12 00 00       call        printf (0BA2C32h) 
      
   invoke testproc2,22   
0000000000BA19DD 48 C7 C1 16 00 00 00 mov         rcx,16h 
0000000000BA19E4 E8 E1 FE FF FF       call        testproc2 (0BA18CAh) 
          
   ret       
0000000000BA19E9 C5 FD 6F 74 24 40    vmovdqa     ymm6,ymmword ptr [rsp+40h] 
0000000000BA19EF C5 7D 6F 7C 24 60    vmovdqa     ymm15,ymmword ptr [rsp+60h] 
0000000000BA19F5 48 81 C4 80 00 00 00 add         rsp,80h 
0000000000BA19FC 5B                   pop         rbx 
0000000000BA19FD C3                   ret
Your opinion about it would be appreciated 8)
Cod-Father

hutch--

I may not have understood the dump but it looks like each register size content is written to the stack. Making room at the address of RSP makes sense and correcting the stack on exit leaves it balanced so it should work OK. Just as a suggestion, is it worth making a pseudo mnemonic in the form of a macro to automate this process, PUSH128, push256 with matching pops ?

habran

Hey hutch-- :biggrin:
thanks for giving a thought about it
let's put it this way:
why not walk, or even better, swim from Adelaide to Sydney instead of flying or driving?

It is not only meter of accuracy, practicality, simplicity and speed but about a beauty
To me Assembler is an art
when I do programming I am not writing a program, I am creating it
I am The Creator, The Artist :bgrin:
Cod-Father

Gunther

Hi Habran,

I agree with Hutch. Is it real necessary? But lets do a compromise: Develop the pseudo mnemonic. Everyone can decide to use it or not. Good luck.

Gunther
You have to know the facts before you can distort them.

rrr314159

It seems fine ... as far as I can tell you can simply include SIMD regs in your "uses" statement, just like GPR's. If u don't want to save/restore them, don't "use" them, no harm done. As Gunther says "Everyone can decide to use it or not" - on a case by case basis. Just like sometimes you "use" ebx, sometimes  not, depending on the specific procedure

BTW hutch, not exactly same as you're saying, but I wrote macros to push / pop xmm and ymm registers (pushx, popx, pushy, popy), simple and useful. Accepts a list as well as just one register
I am NaN ;)

habran

what are you taking about Gunther :icon_eek:
I am a little bit disappointed with you my friend, I thought that you are aware that
current version of JWasm does this:
Quote
testproc PROC FRAME USES rbx xmm6  xmm15  val:QWORD, val2:DWORD
000000000025197A 48 89 4C 24 08       mov         qword ptr [rsp+8],rcx 
000000000025197F 48 89 54 24 10       mov         qword ptr [rsp+10h],rdx 
0000000000251984 53                   push        rbx 
0000000000251985 48 83 EC 60          sub         rsp,60h 
0000000000251989 66 0F 7F 74 24 40    movdqa      xmmword ptr [rsp+40h],xmm6 
000000000025198F 66 44 0F 7F 7C 24 50 movdqa      xmmword ptr [bVar],xmm15 
   
   LOCAL aVar:QWORD   
   LOCAL bVar:QWORD       
   mov eax,val2     
0000000000251996 8B 44 24 78          mov         eax,dword ptr [val2] 
   mov bVar,rax   
000000000025199A 48 89 44 24 50       mov         qword ptr [bVar],rax 
   mov val,33
000000000025199F 48 C7 44 24 70 21 00 00 00 mov         qword ptr [val],21h 
   invoke printf,TZ("val %d"),val
00000000002519A8 48 8B 54 24 70       mov         rdx,qword ptr [val] 
00000000002519AD 48 B9 40 54 25 00 00 00 00 00 mov         rcx,255440h 
00000000002519B7 E8 76 12 00 00       call        printf (0252C32h) 
        
   mov rax,34           
00000000002519BC 48 C7 C0 22 00 00 00 mov         rax,22h 
   mov aVar,rax   
00000000002519C3 48 89 44 24 48       mov         qword ptr [aVar],rax 
      
   invoke printf,TZ("aVar %d"),aVar
00000000002519C8 48 8B 54 24 48       mov         rdx,qword ptr [aVar] 
00000000002519CD 48 B9 47 54 25 00 00 00 00 00 mov         rcx,255447h 
00000000002519D7 E8 56 12 00 00       call        printf (0252C32h) 
      
   invoke testproc2,22   
00000000002519DC 48 C7 C1 16 00 00 00 mov         rcx,16h 
00000000002519E3 E8 EC FE FF FF       call        testproc2 (02518D4h) 
          
   ret       
00000000002519E8 66 0F 6F 74 24 40    movdqa      xmm6,xmmword ptr [rsp+40h] 
00000000002519EE 66 44 0F 6F 7C 24 50 movdqa      xmm15,xmmword ptr [bVar] 
00000000002519F5 48 83 C4 60          add         rsp,60h 
00000000002519F9 5B                   pop         rbx 
00000000002519FA C3                   ret
This was built in long time ago by Japheth, I have just changed it to ymm instead xmm for newest version
However, you can chose which version you want to build because I have made a switch in the proc.c:
Quote
#if EVEXSUPP
#define  OP_XYZMM OP_ZMM
#define XYZMMsize 64
#elif AVXSUPP
#define  OP_XYZMM OP_YMM
#define XYZMMsize 32
#else
#define OP_XYZMM OP_XMM
#define XYZMMsize 16
#endif
However, you have always choice to preserve any register manually, without USES command




Cod-Father

habran

rrr314159 thanks for understanding, I was just about to abandon the project :t
Cod-Father

jj2007

Quote from: rrr314159 on June 05, 2015, 06:04:51 AMI wrote macros to push / pop xmm and ymm registers (pushx, popx, pushy, popy), simple and useful. Accepts a list as well as just one register

Here's the 32-bit version ;-)

include \masm32\MasmBasic\MasmBasic.inc
PushX MACRO regs
LOCAL ct, reg$, mode
  ct=0
  all$ equ < >
  forc c,<&regs>
if ct
reg$ CATSTR reg$, <c>
ct=ct-1
ife ct
% echo [reg$]
all$ CATSTR reg$, <#>, all$
if mode
sub esp, OWORD
movups OWORD PTR [esp], reg$
else
push reg$
endif
endif
elseifidni <c>, <e>
ct=2
mode=0
reg$ equ <c>
elseifidni <c>, <x>
ct=3
mode=1
reg$ equ <c>
elseifnb <c>
.err @CatStr(<what the heck is this: [>, <c>, <]>)
endif
  ENDM
  % echo ALL=[all$]
ENDM

PopX MACRO
LOCAL is, c$, r$
  is INSTR all$, <#>
  While is
c$ SUBSTR all$, 1, 1
r$ substr all$, 1, is-1
all$ substr all$, is+1
ifidni c$, <x>
movups r$, oword ptr [esp]
add esp, oword
else
pop r$
endif
is INSTR all$, <#>
  ENDM
ENDM

  Init
  mov esi, 11111111
  mov edi, 22222222
  mov ebx, 33333333
  movd xmm0, esi
  movd xmm1, edi
  movd xmm2, ebx
  deb 4, "before", esi, edi, ebx, xmm0, xmm1, xmm2, esp
  PushX esi  edi  ebx  xmm0 xmm1 xmm3
  nop
  PopX
  deb 4, "after", esi, edi, ebx, xmm0, xmm1, xmm2, esp
  Exit
end start

rrr314159

Quote from: Habranrrr314159 thanks for understanding, I was just about to abandon the project :t

- great, I try to do one useful thing each month - that takes care of June!

@jj2007,

great minds think alike! But my version is simpler, partly because it's just for my use - unlike MasmBasic which gets used by many. I only do xmm's (or, ymm's separate version) not GPR's, and no error checking. Why don't u have commas between reg's, isn't that easier (don't have to use forc)? Hmm ... guess it's because u save the string (all$) for later use by PopX. Again I'm simpler, user (i.e. me) has to type the list in again for popping, and must be in correct (inverse of course) order. Only extra I've got is to make sure stack is aligned at 16 - but even for 64 bit, u can use movups instead of movaps. I was influenced by MS 64-bit API, but now that u mention it, not necessary
I am NaN ;)

rrr314159

@HJWasm,

BTW I believe I have the copyright on that name, see this post, near the bottom :biggrin:

Look forward to the day you feel you've put so much into it it can be called "Hasm", so much easier to pronounce (easier than JWasm too)

It occurred to me that if an organization, with .org address, were to write an asm, they would get a lot of extra hits when they named it ... :eusa_naughty:
I am NaN ;)

jj2007

Quote from: rrr314159 on June 05, 2015, 07:56:55 AMWhy don't u have commas between reg's, isn't that easier (don't have to use forc)?

Inspired by somealgo proc uses esi edi ebx (<<< no commas)

QuoteOnly extra I've got is to make sure stack is aligned at 16 - but even for 64 bit, u can use movups instead of movaps.

I thought of that, too. How do you do that? Runtime check, or assembly-time assumptions about the stack being aligned? One could also eliminate some of the add esp, OWORD code.

Here is a little trick to avoid that user forgets PopX (doesn't generate any code):
PushX ...
  ENDM
  % echo ALL=[all$]
  .if 1
ENDM

PopX
...
  ENDM
  .endif
ENDM


Btw this is not (yet) MB - just hacked that together because I liked the idea.

habran

Quote from: rrr314159 on June 05, 2015, 08:51:21 AM
@HJWasm,

BTW I believe I have the copyright on that name, see this post, near the bottom :biggrin:
I already admitted it in one of my posts

>It occurred to me that if an organization, with .org address, were to write an asm, they would get a lot of extra hits when they named it ... :eusa_naughty:
please explain
Cod-Father

rrr314159

#12
Quote from: HabranI already admitted it in one of my posts

- Yes I saw that post, couple months ago, and am just kidding

Quote from: Habranplease explain

- Now you're kidding, right?
I am NaN ;)

Gunther

Hi Habran,

Quote from: habran on June 05, 2015, 06:27:35 AM
what are you taking about Gunther :icon_eek:
I am a little bit disappointed with you my friend, I thought that you are aware that
current version of JWasm does this:
Quote
testproc PROC FRAME USES rbx xmm6  xmm15  val:QWORD, val2:DWORD
000000000025197A 48 89 4C 24 08       mov         qword ptr [rsp+8],rcx 
000000000025197F 48 89 54 24 10       mov         qword ptr [rsp+10h],rdx 
0000000000251984 53                   push        rbx 
0000000000251985 48 83 EC 60          sub         rsp,60h 
0000000000251989 66 0F 7F 74 24 40    movdqa      xmmword ptr [rsp+40h],xmm6 
000000000025198F 66 44 0F 7F 7C 24 50 movdqa      xmmword ptr [bVar],xmm15 
   
   LOCAL aVar:QWORD   
   LOCAL bVar:QWORD       
   mov eax,val2     
0000000000251996 8B 44 24 78          mov         eax,dword ptr [val2] 
   mov bVar,rax   
000000000025199A 48 89 44 24 50       mov         qword ptr [bVar],rax 
   mov val,33
000000000025199F 48 C7 44 24 70 21 00 00 00 mov         qword ptr [val],21h 
   invoke printf,TZ("val %d"),val
00000000002519A8 48 8B 54 24 70       mov         rdx,qword ptr [val] 
00000000002519AD 48 B9 40 54 25 00 00 00 00 00 mov         rcx,255440h 
00000000002519B7 E8 76 12 00 00       call        printf (0252C32h) 
        
   mov rax,34           
00000000002519BC 48 C7 C0 22 00 00 00 mov         rax,22h 
   mov aVar,rax   
00000000002519C3 48 89 44 24 48       mov         qword ptr [aVar],rax 
      
   invoke printf,TZ("aVar %d"),aVar
00000000002519C8 48 8B 54 24 48       mov         rdx,qword ptr [aVar] 
00000000002519CD 48 B9 47 54 25 00 00 00 00 00 mov         rcx,255447h 
00000000002519D7 E8 56 12 00 00       call        printf (0252C32h) 
      
   invoke testproc2,22   
00000000002519DC 48 C7 C1 16 00 00 00 mov         rcx,16h 
00000000002519E3 E8 EC FE FF FF       call        testproc2 (02518D4h) 
          
   ret       
00000000002519E8 66 0F 6F 74 24 40    movdqa      xmm6,xmmword ptr [rsp+40h] 
00000000002519EE 66 44 0F 6F 7C 24 50 movdqa      xmm15,xmmword ptr [bVar] 
00000000002519F5 48 83 C4 60          add         rsp,60h 
00000000002519F9 5B                   pop         rbx 
00000000002519FA C3                   ret
This was built in long time ago by Japheth, I have just changed it to ymm instead xmm for newest version

yes, I've checked that. It was my fault. No offense.

Gunther
You have to know the facts before you can distort them.

habran

None taken Gunther :biggrin:

3RPi I knew that you were kidding but I was not, what does that mean?
Cod-Father