News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

HJWasm 2.15 uploaded

Started by habran, September 05, 2016, 09:14:38 AM

Previous topic - Next topic

habran

Hi everyone!

Just to inform you that HJWasm 2.15 is uploaded on Teraspace 8)
Johnsa and myself lost several kilos (not that we regret it) to make it happened :biggrin:
VECTORCALL was Johnsa's idea and I'll never forgive him for pushing me so hard to w**k on it, however he has done a big part of it and proved to me that he is a brilliant programmer :t

What we have done is:
1.) Allowed xmm, ymm and zmm registers to be saved at the same time with USES.
2.) Fixed problem with the reserved stack size
3.) Implemented VECTORCALL (R-VECTORCALL)
4.) Implemented SSE compatibility with ML64 (automatic xmmword type promotion with switch –Zg)
5.) Fixed the bug with MOVSS
6.) Fixed EIP/RIP encoding bug

Attached files contain all necessary structures and unions for the VECTORCALL, thanks to Johnsa
If someone is interested in VECTORCALL, we provided also some code examples 

We believe that this version is the best yet.

Enjoy :biggrin:

Cod-Father

jj2007

Quote from: habran on September 05, 2016, 09:14:38 AMWe believe that this version is the best yet.

Looks good :t

The 32-bit version is 10% faster than HJWasm64 8)

habran

Thanks jj2007 :biggrin:
Just to let you know that the VECTORCALL is available only in 64 bit.
I hope no one will demand it in 32 bit, because I don't want to lose another 5-6 kilos, it would require a change of my clothing size :P
Cod-Father

johnsa


It was indeed a lot .. of.. "fun" to implement vectorcall, it's a typically arse-backward standard that could have been much simpler!
But non the less its in and it does have a very good reason to exist.

Looking forward to 2.16 and 2.17 we have a long list of ideas for new features (many I'm sure the purists will not agree on) :)
These are things that I personally would find very helpful, especially when maintaining larger code bases:

1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
2) Overloaded procedures, we identify the relevant PROC by name AND parameter types.. this one is super helpful to me especially when combined with 3.
3) Support namespaces.


NAMESPACE VectorMath

Normalize PROC VECTORCALL FRAME vec:__m256d
   ret
Normalize ENDP

Normalize PROC VECTORCALL FRAME vec:__m128f
  ret
Normalize ENDP

Normalize PROC VECTORCALL FRAME vec:hfa3
  ret
Normalize ENDP

ENDS

Now I can just invoke with:

invoke VectorMath.Normalize, myVector ; myVector could be hfa3/4-element float simd type or 4-element double simd type.


As with all of these things, they're not mandatory so you can ignore namespaces and all existing code works as-is by being in the default global namespace.

These are just some ideas and of course we're always welcome to other suggestions!

jj2007

Quote from: johnsa on September 05, 2016, 06:50:35 PM1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.

It's worth a try. Duplication can be avoided with macros, too; this one uses the same memory location:
  PrintLine "This is a test"
  PrintLine "This is a test"


Still, it would make life easier to let the compiler organise that. Re direct literal string support, it's already implemented in the rv() macro that some of us use. Again, it would do no harm to add it.

These days I don't agree much with Hutch :(
However, on one point I am perfectly in line with him: Your compiler, pardon: assembler, is fine, what's missing is the codebase - macros and libraries.

TWell

Quote from: johnsa on September 05, 2016, 06:50:35 PM1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
:t :t :t

After that this is possible too?.data
msgA db "ANSI",13,10,0
msgW dw L"UNICODE",13,10,0

johnsa

That is the intention yes, for data declaration as well as directly in invoke.
Possibly also direct use with opcodes like:

lea rax,"This is an ASCII string"
lea rdx,L"This is a unicode string"

TWell

Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?

johnsa



TestProc10:
000000013FC1181A 48 83 EC 28          sub         rsp,28h   ;HERE
000000013FC1181E 48 8D 04 24          lea         rax,[rsp] 
000000013FC11822 48 89 44 24 30       mov         qword ptr [a],rax 
000000013FC11827 C5 F8 10 44 24 30    vmovups     xmm0,xmmword ptr [a] 
000000013FC1182D 48 83 C4 28          add         rsp,28h   ;HERE
000000013FC11831 C3                   ret 



Are you referring to the above as part of the prologue/epilogue ?
HJWASM won't add any stack modification if you don't have/use any locals or arguments.
Removing it when there are locals/arguments wouldn't add any value as the stack references would need to use negative indices and may/probably be less performant than the single add/sub.. in addition it would make it very difficult to handle nested or recursive calls.

TWell

No, just a simple invoke of printf.
I compare hwjasm and poasm.

OPTION WIN64:2 is good for my tests.

johnsa

Can you send an example of your printf invoke that's generating add/sub rsps?
thanks

mineiro

Congratulations sir's habran and johnsa.
Downloading and trying.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

TWell

option casemap :none
option epilogue:none
option prologue:none
OPTION WIN64:2

exit proto :dword
printf proto args:vararg
includelib msvcrt64.lib

.data
msg  db "Hello msvcrt.dll",13,10,0

.code
mainCRTStartup proc
  invoke printf,offset msg
  invoke exit, 0
mainCRTStartup endp
end

johnsa

That all seems very odd.. with some modification:



--- b.asm ----------------------------------------------------------------------
mainCRTStartup:
000000013F161010 48 83 EC 20          sub         rsp,20h 
000000013F161014 48 8D 0D E5 3F 00 00 lea         rcx,[msg (013F165000h)] 
000000013F16101B E8 14 10 00 00       call        printf (013F162034h) 
000000013F161020 33 C9                xor         ecx,ecx 
000000013F161022 E8 07 10 00 00       call        exit (013F16202Eh) 



the sub rsp,20h is the prologue for mainCRTStartup and has nothing to do with the invoke of printf.
You also used OFFSET instead of ADDR which wasn't generating the right code.
I also changed my source to msvcrt.lib (don't know if yours is actually called msvcrt64.lib)

I've set the HJWASM option's to the optimal, automatic frame, RSP for stackbase and win64:11:



.x64
option casemap:none
option win64:11
option frame:auto
option STACKBASE:RSP

exit proto :dword
printf proto args:vararg
includelib msvcrt.lib

.data
msg  db "Hello msvcrt.dll",13,10,0

.code
mainCRTStartup proc
  invoke printf,ADDR msg
  invoke exit, 0
mainCRTStartup endp
end


TWell

This code i mean:
00000000 sub      rsp, 20h                 ; 4883ec20
00000004 lea      rcx, [rip+0h]            ; 488d0d00000000
0000000b call     printf                   ; e800000000
00000010 add      rsp, 20h                 ; 4883c420
00000014 sub      rsp, 20h                 ; 4883ec20
00000018 xor      ecx, ecx                 ; 33c9
0000001a call     exit                     ; e800000000
0000001f add      rsp, 20h                 ; 4883c420

I tested naked function to see when it crash, so assembler intervention wasn't an option.
Right options found already.