The MASM Forum

64 bit assembler => UASM Assembler Development => Topic started by: habran on September 05, 2016, 09:14:38 AM

Title: HJWasm 2.15 uploaded
Post by: habran on September 05, 2016, 09:14:38 AM
Hi everyone!

Just to inform you that HJWasm 2.15 is uploaded on Teraspace (http://www.terraspace.co.uk/hjwasm.html) 8)
Johnsa and myself lost several kilos (not that we regret it) to make it happened :biggrin:
VECTORCALL was Johnsa's idea and I'll never forgive him for pushing me so hard to w**k on it, however he has done a big part of it and proved to me that he is a brilliant programmer :t

What we have done is:
1.) Allowed xmm, ymm and zmm registers to be saved at the same time with USES.
2.) Fixed problem with the reserved stack size
3.) Implemented VECTORCALL (R-VECTORCALL)
4.) Implemented SSE compatibility with ML64 (automatic xmmword type promotion with switch –Zg)
5.) Fixed the bug with MOVSS
6.) Fixed EIP/RIP encoding bug

Attached files contain all necessary structures and unions for the VECTORCALL, thanks to Johnsa
If someone is interested in VECTORCALL, we provided also some code examples 

We believe that this version is the best yet.

Enjoy :biggrin:

Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 05, 2016, 11:37:22 AM
Quote from: habran on September 05, 2016, 09:14:38 AMWe believe that this version is the best yet.

Looks good :t

The 32-bit version is 10% faster than HJWasm64 8)
Title: Re: HJWasm 2.15 uploaded
Post by: habran on September 05, 2016, 11:56:48 AM
Thanks jj2007 :biggrin:
Just to let you know that the VECTORCALL is available only in 64 bit.
I hope no one will demand it in 32 bit, because I don't want to lose another 5-6 kilos, it would require a change of my clothing size :P
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 05, 2016, 06:50:35 PM

It was indeed a lot .. of.. "fun" to implement vectorcall, it's a typically arse-backward standard that could have been much simpler!
But non the less its in and it does have a very good reason to exist.

Looking forward to 2.16 and 2.17 we have a long list of ideas for new features (many I'm sure the purists will not agree on) :)
These are things that I personally would find very helpful, especially when maintaining larger code bases:

1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
2) Overloaded procedures, we identify the relevant PROC by name AND parameter types.. this one is super helpful to me especially when combined with 3.
3) Support namespaces.


NAMESPACE VectorMath

Normalize PROC VECTORCALL FRAME vec:__m256d
   ret
Normalize ENDP

Normalize PROC VECTORCALL FRAME vec:__m128f
  ret
Normalize ENDP

Normalize PROC VECTORCALL FRAME vec:hfa3
  ret
Normalize ENDP

ENDS

Now I can just invoke with:

invoke VectorMath.Normalize, myVector ; myVector could be hfa3/4-element float simd type or 4-element double simd type.


As with all of these things, they're not mandatory so you can ignore namespaces and all existing code works as-is by being in the default global namespace.

These are just some ideas and of course we're always welcome to other suggestions!
Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 05, 2016, 08:29:47 PM
Quote from: johnsa on September 05, 2016, 06:50:35 PM1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.

It's worth a try. Duplication can be avoided with macros, too; this one uses the same memory location:
  PrintLine "This is a test"
  PrintLine "This is a test"


Still, it would make life easier to let the compiler organise that. Re direct literal string support, it's already implemented in the rv() macro that some of us use. Again, it would do no harm to add it.

These days I don't agree much with Hutch :(
However, on one point I am perfectly in line with him: Your compiler, pardon: assembler, is fine, what's missing is the codebase - macros and libraries.
Title: Re: HJWasm 2.15 uploaded
Post by: TWell on September 05, 2016, 09:19:55 PM
Quote from: johnsa on September 05, 2016, 06:50:35 PM1) Direct literal string support on invoke.. "" and L"" .. so we don't need a text macro anymore, the other advantage is that we can optimise the string table produced down to replace duplicate strings.
:t :t :t

After that this is possible too?.data
msgA db "ANSI",13,10,0
msgW dw L"UNICODE",13,10,0
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 05, 2016, 10:21:01 PM
That is the intention yes, for data declaration as well as directly in invoke.
Possibly also direct use with opcodes like:

lea rax,"This is an ASCII string"
lea rdx,L"This is a unicode string"
Title: Re: HJWasm 2.15 uploaded
Post by: TWell on September 05, 2016, 11:04:38 PM
Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 05, 2016, 11:11:42 PM


TestProc10:
000000013FC1181A 48 83 EC 28          sub         rsp,28h   ;HERE
000000013FC1181E 48 8D 04 24          lea         rax,[rsp] 
000000013FC11822 48 89 44 24 30       mov         qword ptr [a],rax 
000000013FC11827 C5 F8 10 44 24 30    vmovups     xmm0,xmmword ptr [a] 
000000013FC1182D 48 83 C4 28          add         rsp,28h   ;HERE
000000013FC11831 C3                   ret 



Are you referring to the above as part of the prologue/epilogue ?
HJWASM won't add any stack modification if you don't have/use any locals or arguments.
Removing it when there are locals/arguments wouldn't add any value as the stack references would need to use negative indices and may/probably be less performant than the single add/sub.. in addition it would make it very difficult to handle nested or recursive calls.
Title: Re: HJWasm 2.15 uploaded
Post by: TWell on September 05, 2016, 11:23:52 PM
No, just a simple invoke of printf.
I compare hwjasm and poasm.

OPTION WIN64:2 is good for my tests.
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 06, 2016, 12:48:10 AM
Can you send an example of your printf invoke that's generating add/sub rsps?
thanks
Title: Re: HJWasm 2.15 uploaded
Post by: mineiro on September 06, 2016, 12:51:57 AM
Congratulations sir's habran and johnsa.
Downloading and trying.
Title: Re: HJWasm 2.15 uploaded
Post by: TWell on September 06, 2016, 01:14:03 AM
option casemap :none
option epilogue:none
option prologue:none
OPTION WIN64:2

exit proto :dword
printf proto args:vararg
includelib msvcrt64.lib

.data
msg  db "Hello msvcrt.dll",13,10,0

.code
mainCRTStartup proc
  invoke printf,offset msg
  invoke exit, 0
mainCRTStartup endp
end
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 06, 2016, 01:35:08 AM
That all seems very odd.. with some modification:



--- b.asm ----------------------------------------------------------------------
mainCRTStartup:
000000013F161010 48 83 EC 20          sub         rsp,20h 
000000013F161014 48 8D 0D E5 3F 00 00 lea         rcx,[msg (013F165000h)] 
000000013F16101B E8 14 10 00 00       call        printf (013F162034h) 
000000013F161020 33 C9                xor         ecx,ecx 
000000013F161022 E8 07 10 00 00       call        exit (013F16202Eh) 



the sub rsp,20h is the prologue for mainCRTStartup and has nothing to do with the invoke of printf.
You also used OFFSET instead of ADDR which wasn't generating the right code.
I also changed my source to msvcrt.lib (don't know if yours is actually called msvcrt64.lib)

I've set the HJWASM option's to the optimal, automatic frame, RSP for stackbase and win64:11:



.x64
option casemap:none
option win64:11
option frame:auto
option STACKBASE:RSP

exit proto :dword
printf proto args:vararg
includelib msvcrt.lib

.data
msg  db "Hello msvcrt.dll",13,10,0

.code
mainCRTStartup proc
  invoke printf,ADDR msg
  invoke exit, 0
mainCRTStartup endp
end

Title: Re: HJWasm 2.15 uploaded
Post by: TWell on September 06, 2016, 01:53:06 AM
This code i mean:
00000000 sub      rsp, 20h                 ; 4883ec20
00000004 lea      rcx, [rip+0h]            ; 488d0d00000000
0000000b call     printf                   ; e800000000
00000010 add      rsp, 20h                 ; 4883c420
00000014 sub      rsp, 20h                 ; 4883ec20
00000018 xor      ecx, ecx                 ; 33c9
0000001a call     exit                     ; e800000000
0000001f add      rsp, 20h                 ; 4883c420

I tested naked function to see when it crash, so assembler intervention wasn't an option.
Right options found already.
Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 06, 2016, 02:51:00 AM
Quote from: johnsa on September 06, 2016, 01:35:08 AMmainCRTStartup:
000000013F161010 48 83 EC 20          sub         rsp,20h
000000013F161014 48 8D 0D E5 3F 00 00 lea         rcx,[msg (013F165000h)]
000000013F16101B E8 14 10 00 00       call        printf (013F162034h) 


the sub rsp,20h is the prologue for mainCRTStartup and has nothing to do with the invoke of printf.

Are you sure? I thought the ABI requires that you grant 4 QWORDS of stack to printf ::)

include \Masm32\MasmBasic\Res\JBasic.inc ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
Init
  int 3
  jinvoke crt_printf, Chr$("Hello msvcrt.dll, 13, 10")
EndOfCode


translates to:
CC                       | int3                               |
53                       | push rbx                           |
53                       | push rbx                           |
53                       | push rbx                           |
48 8D 0D EF 1F 00 00     | lea rcx, qword ptr ds:[140003004]  | 140003004:"Hello rt.dll, , "
51                       | push rcx                           |
FF 15 E4 20 00 00        | call qword ptr ds:[<&printf>]      |
48 83 C4 20              | add rsp, 20                        |
53                       | push rbx                           |
53                       | push rbx                           |
53                       | push rbx                           |
6A 00                    | push 0                             |
48 8B 0C 24              | mov rcx, qword ptr ss:[rsp]        | correct the stack
FF 15 D9 20 00 00        | call qword ptr ds:[<&RtlExitUserProcess |


A different way to allocate the 4 QWORDs.
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 06, 2016, 05:51:08 AM
Yes, the minimum reservation for any non-leaf function should be 32 bytes (which is 20h).
There is no need to allocate stack space per call, instead it should be done once per PROC by examining all the invokes which occur inside that proc and determining the maximum reservation for all of them.
so for example if you had:

Main PROC

invoke a
invoke b,1,2,3,4
invoke c,1,2

ENDP

There would be a single sub rsp,20h (because no PROC used inside Main requires more than 4 arguments).
However, if you had:

Main PROC

invoke a
invoke b,1,2,3,4
invoke c,1,2,3,4,5

ENDP

you would get a single sub rsp,28h to account for the 5 arguments of function c, but to keep the stack xmmword aligned we'd round that up to 30h so that first local allocated in any proc was 16byte aligned too.
This is better for cache access to the stack, and also saves a load of unnecessary add/sub (or in your example pushes).

On a slightly unrelated note, we do a lot of these behind the scenes optimisations to setup the stack prologue/epilogue generation, managing alignment etc in hjwasm.. which is another reason that invoke is so important (I believe there was a comment on another thread about it ).. It allows the assembler to take care of things that would otherwise be extremely difficult or painful to manage manually (especially in 64bit), and would more than likely either land up with hard to find bugs or sub-optimal code.. and if writing assembly by hand lands up generating less optimal code than a C compiler .... then it really has lost any reason to exist!
This was one of the main drivers for having vectorcall support (apart from interop) .. we can't have a tool that doesnt' provide every opportunity to write the most performant code possible with the minimum of trouble when some silly old HLL compiler can do it! :)
Title: Re: HJWasm 2.15 uploaded
Post by: habran on September 06, 2016, 06:07:45 AM
QuoteAre you sure? I thought the ABI requires that you grant 4 QWORDS of stack to printf ::)
You are correct JJ :t
Because TWell is not using FRAME and PROLOGUE and he is using only OPTION WIN64:2, 20h is allocation of home space
for 4 registers shadows for invoke:
    W64F_SAVEREGPARAMS = 0x01, /* 1=save register params in shadow space on proc entry */
    W64F_AUTOSTACKSP     = 0x02, /* 1=calculate required stack space for arguments of INVOKE */
    W64F_STACKALIGN16    = 0x04, /* 1=stack variables are 16-byte aligned; added in v2.12 */
    W64F_SMART                = 0x08, /* 1=takes care of everything */


Long time ago I have written first STACKBASE:RSP and forced Japheth to implement it in official version, because I was pissed off with the stupid adding and subtracting 20h on every call of subroutine.
Now, hutch is having the same problem with ML64

So, my answer to your original question:
Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?
Yes, use this in the beginning of your source:
   option casemap:none         
   option win64:11                 ;W64F_SAVEREGPARAMS+W64F_AUTOSTACKSP + W64F_SMART
   option frame:auto              ;this writes PROLOGUE and EPILOGUE for you automatically
   option STACKBASE:RSP      ;this allocates home space only ones for all invoke you use in a PROC
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 06, 2016, 06:10:52 AM
Quote from: habran on September 06, 2016, 06:07:45 AM
QuoteAre you sure? I thought the ABI requires that you grant 4 QWORDS of stack to printf ::)
You are correct JJ :t
Because TWell is not using FRAME and PROLOGUE and he is using only OPTION WIN64:2, 20h is allocation of home space
for 4 registers shadows for invoke:
    W64F_SAVEREGPARAMS = 0x01, /* 1=save register params in shadow space on proc entry */
    W64F_AUTOSTACKSP     = 0x02, /* 1=calculate required stack space for arguments of INVOKE */
    W64F_STACKALIGN16    = 0x04, /* 1=stack variables are 16-byte aligned; added in v2.12 */
    W64F_SMART                = 0x08, /* 1=takes care of everything */


Long time ago I have written first STACKBASE:RSP and forced Japheth to implement it in official version, because I was pissed off with the stupid adding and subtracting 20h on every call of subroutine.
Now, hutch is having the same problem with ML64

So, my answer to your original question:
Is it possible to avoid using those sub rsp/add rsp in every invoke with some option switch?
Yes, use this in the beginning of your source:
   option casemap:none         
   option win64:11                 ;W64F_SAVEREGPARAMS+W64F_AUTOSTACKSP + W64F_SMART
   option frame:auto              ;this writes PROLOGUE and EPILOGUE for you automatically
   option STACKBASE:RSP      ;this allocates home space only ones for all invoke you use in a PROC

I posted a bit further back the modified version to make use of these options, which generated the shorter/optimal proc with out all the add/sub'ishness.
Quote
   .x64
   option casemap:none
   option win64:11
   option frame:auto
   option STACKBASE:RSP
   
exit proto :dword
printf proto args:vararg
includelib msvcrt.lib

.data
msg  db "Hello msvcrt.dll",13,10,0

.code
mainCRTStartup proc
  invoke printf,ADDR msg
  invoke exit, 0
mainCRTStartup endp
end

Exactly as per my post :) Habran and I both insisted on this back while Japheth was still in charge of JWasm. It's exactly the same way a compiler would generate it, and without a tremendous amount of trouble (I'm not sure if at all) to implement this same sort of logic/optimisation from a macro would be impossible as it would require scoped knowledge of the PROC it's being used in and every other invoke within that same proc.. or you can go "low level" and roll this yourself in ML64 and manually update your stack allocation prologue everytime you add an invoke anywhere.. fun times .. not ;)
Title: Re: HJWasm 2.15 uploaded
Post by: hutch-- on September 06, 2016, 06:25:55 AM
> Now, hutch is having the same problem with ML64

Only in the experimental stages, I have written a prologue/epilogue that is problem free and adjustable. Either high level code or it can be turned off for pure mnemonic code.

This empty proc,

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc


    ret

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

gives me this disassembly.

.text:0000000140001000 C8800000                   enter 0x80, 0x0
.text:0000000140001004 4883EC40                   sub rsp, 0x40
.text:0000000140001008 C9                         leave
.text:0000000140001009 C3                         ret

Both the ENTER 1st arg and the stack adjustment are adjustable and it has been super reliable.
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 06, 2016, 06:32:06 AM
How are you planning on handling the calculation of the correct amount to adjust RSP by ? using a forward reference to a total which is only determined once the epilogue is reached and back-filling it into the prologue ?
(I'm not sure if that can be done with macros even without trying, depending on the provision of it allowing the forward reference and order of macro expansion :) ) I guess you would have to completely replace ML64's invoke/prologue/epilogue generation and assuming the former idea worked you could possibly achieve the same sort of result as hjwasm invoke.
Title: Re: HJWasm 2.15 uploaded
Post by: hutch-- on September 06, 2016, 06:50:44 AM
This high level code generates the following disassembly.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

WndProc proc hWin:QWORD,uMsg:QWORD,wParam:QWORD,lParam:WORD

    LOCAL rct :RECT
    LOCAL buffer[128]:BYTE
    LOCAL pbuf :QWORD

    ret

WndProc endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

empty WndProc
sub_14000102b   proc
.text:000000014000102b C8800000                   enter 0x80, 0x0
.text:000000014000102f 4881ECE0000000             sub rsp, 0xe0
.text:0000000140001036 48894D10                   mov qword ptr [rbp+0x10], rcx
.text:000000014000103a 48895518                   mov qword ptr [rbp+0x18], rdx
.text:000000014000103e 4C894520                   mov qword ptr [rbp+0x20], r8
.text:0000000140001042 4C894D28                   mov qword ptr [rbp+0x28], r9
.text:0000000140001046 C9                         leave
.text:0000000140001047 C3                         ret
sub_14000102b   endp

An empty procedure with no stack frame generates the following.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

empty proc


    ret

empty endp

STACKFRAME

empty NOSTACKFRAME proc
.text:000000014000104d
.text:000000014000104d 0x14000104d:
.text:000000014000104d C3                         ret

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

Quote
I guess you would have to completely replace ML64's invoke/prologue/epilogue generation and assuming the former idea worked you could possibly achieve the same sort of result as hjwasm invoke.
ML64 comes unconfigured so you have no choice other than to write a prologue/epilogue and an automated call notation, "invoke" being the most common. It calculate the byte count for the locals then subtracts from RSP while maintaining the correct alignment to provide the LOCAL space. Passed arguments are addressed above RSP.
Title: Re: HJWasm 2.15 uploaded
Post by: hutch-- on September 06, 2016, 07:10:29 AM
One more.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

stackframe_dbgdata equ <1>      ; turn on stackframe output
stackframe_default equ <256>    ; increase default stack size
stackframe_dynamic equ <512>    ; increase ENTER dynamic size

STACKFRAME

testproc proc a1:QWORD,a2:QWORD,a3:QWORD,a4:QWORD,a5:QWORD

    LOCAL var1 :QWORD
    LOCAL var2 :QWORD
    LOCAL var3 :QWORD
    LOCAL var4 :QWORD
    LOCAL var5 :QWORD
    LOCAL var6 :QWORD
    LOCAL var7 :QWORD
    LOCAL var8 :QWORD

    nop

    ret

testproc endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end

.  ****************************
.  PROLOGUE testproc
.  arg count   = 5
.  local bytes = 64
.  ****************************

sub_14000104d   proc
.text:000000014000104d C3                         ret
.text:000000014000104e C8000200                   enter 0x200, 0x0
.text:0000000140001052 4881EC40010000             sub rsp, 0x140
.text:0000000140001059 48894D10                   mov qword ptr [rbp+0x10], rcx
.text:000000014000105d 48895518                   mov qword ptr [rbp+0x18], rdx
.text:0000000140001061 4C894520                   mov qword ptr [rbp+0x20], r8
.text:0000000140001065 4C894D28                   mov qword ptr [rbp+0x28], r9
.text:0000000140001069 90                         nop
.text:000000014000106a C9                         leave
.text:000000014000106b C3                         ret
sub_14000104d   endp
Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 06, 2016, 08:58:36 AM
Quote from: johnsa on September 06, 2016, 06:10:52 AMwithout a tremendous amount of trouble (I'm not sure if at all) to implement this same sort of logic/optimisation from a macro would be impossible as it would require scoped knowledge of the PROC it's being used in and every other invoke within that same proc..

Quote from: johnsa on September 06, 2016, 06:32:06 AMHow are you planning on handling the calculation of the correct amount to adjust RSP by ? using a forward reference to a total which is only determined once the epilogue is reached and back-filling it into the prologue ?

It is not that difficult, actually: You start with a reasonable default (e.g. 12 args as in CreateWindowEx), and if that's not enough, let the EPILOGUE macro tell the user that he must manually increase the reserved stack. Plus, if user is scared of running out of stack, e.g. in a recursive proc, he can start with a very low value, and let the epilogue macro inform him how much is really needed. Manual intervention will be a very rare case.

I had it running already, but, see here (http://masm32.com/board/index.php?topic=5528.msg60079#msg60079), there is a really weird behaviour of the PROLOGUE macro that I am fighting with right now. Maybe one of the masters of the Watcom universe has a clue what happens there. Plain Masm32 test case attached - the Watcom assemblers give also a wrong line count, see @Line in the attached source.
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 07, 2016, 01:14:55 AM
Hi,

Just to let you all know .. I spotted a bug in the new invoke code when using parameter types like [rsi], [reg+ofs] and [reg].struct.member on fastcall procedures.. I've fixed this and updated both the repository and packages on the site so there's an update for you dated 6/9/2016.

John
Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 07, 2016, 02:41:52 AM
Perhaps you should post a more recent version here (http://www.terraspace.co.uk/hjwasm.html#p2):
HJWasm 2.15 (32bit)    6/08/2016    hjwasm215_x86.zip    32bit Binary Package (Windows)
HJWasm 2.15 (64bit)    6/08/2016    hjwasm215_x64.zip    64bit Binary Package (Windows)

6 August is one month ago 8)
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 07, 2016, 03:41:16 AM
arghh... dumb typo, fixing now :) thanks for spotting!
Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 07, 2016, 05:36:07 AM
No problem, I saw the correct timestamp in the zip archive :P

There is still the prolog issue. What I found out is that
- the prolog macro kicks in only when the assembler hits the first real instruction after the locals (ML and Watcom)
- within the prolog macro, the @Line macro yields the sometest proc line in ML but the line after the locals plus 5 in the Watcom family.

Attached new plain Masm32 testbed. I assume it would be the same for 64-bit code.
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 07, 2016, 07:29:00 AM
I've found the bits of code responsible for it.. I'm just not sure 100% yet what to do about it..  ::)
Technically the macro is being run in the same place by both ML and WATC family and our line number is more "correct" than ML's (not to say there isn't a reason for a custom prologue to do this)
so can you help me understand why you need the line number at time of prologue execution == the proc line ? (maybe it will help decide how best to change it)

Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 07, 2016, 07:45:56 AM
As a fix.. I've implemented a new built-in equate @ProcLine

which gives us:


SomeProlog MACRO procname, flags, argbytes, localbytes, reglist, userparms
Local tmp$, up$, alignOK, alignBad, is, alignedUses
  pLine=@Line
  % echo ## prologue of procname ##
  hello$ equ <The prologue macro has changed the string>
  push ebp
  mov ebp, esp ; create frame
  up$ CATSTR <userparms>, < >
  tmp$ CATSTR <line >, %@Line, < (should be line 24, ok with ML but not with Watcom family)>
  % echo tmp$
  tmp$ CATSTR <## PROLOG, line >, %@Line, <: &procname>, <: args+locals=>, <argbytes>, <+>, <localbytes>, <=>, %(argbytes+localbytes),  <, _flags=>, <flags>,  <, _userparms=>, <userparms>
  % echo tmp$
  % echo ## end of procname prologue ##
  IFDEF @ProcLine
  tmp$ CATSTR <line >, %@Line, < - proc line >, %@ProcLine
  ELSE
  tmp$ CATSTR <line >, %@Line, < - proc line >, %@Line
  ENDIF
  % echo tmp$
  EXITM %localbytes
ENDM


Which gives the correct results then for both ML and HJWASM.
If this works for you I will update source/packages?
Title: Re: HJWasm 2.15 uploaded
Post by: johnsa on September 07, 2016, 08:06:53 AM
I've uploaded this change, give it a try.

You can now use @ProcLine or @Line (just use it inside the IFDEF) and it should work out totally MASM compatible and HJWASM.

Cheers
John
Title: Re: HJWasm 2.15 uploaded
Post by: jj2007 on September 07, 2016, 09:09:39 AM
Quote from: johnsa on September 07, 2016, 07:29:00 AMso can you help me understand why you need the line number at time of prologue execution == the proc line ? (maybe it will help decide how best to change it)

Thanks for providing a ML compatible solution. The reason why I am using this is a bit complicated: The jinvoke macro can produce 64-bit code in two ways,
- by pushing 4 times args and dummy args
- by moving args into their stack locations

The switch is the cs in the prolog macro's userparms, i.e.
WndProc proc <cb cs> uses rsi rdi rbx hWnd:HWND, uMsg:UINT, wParam:WPARAM, lParam:LPARAM
LOCAL ps:PAINTSTRUCT
  cmp uMsg, WM_CREATE
  jne not_create


cs stands for "compiler style". If the prolog macro sees cs, it allocates stack for all jinvokes that follow and sets a flag; if not, the jinvoke macro must create the 4*QWORD stack every time.

To achieve this, prolog contains jbCompStyle INSTR <userparms>, <cs>

Now, the only problem is that this variable, meant for jinvoke, is not yet set if, and only if, jinvoke is the first command after the LOCALs. Therefore I wanted to check, in jinvoke, if it is the first command, and force an error ("put a nop before jinvoke") if that is the case.

Now this is the reason behind the wish to have ML compatibility. What strikes me, though, is that this check regards the last Local line, not the proc line - which is closer to the Watcom behaviour than to ML ::)

I know this is a very exotic request, so don't take it too seriously :P