The MASM Forum

64 bit assembler => UASM Assembler Development => Topic started by: johnsa on April 01, 2017, 08:44:22 AM

Title: HJWasm 2.23 Release
Post by: johnsa on April 01, 2017, 08:44:22 AM
whats in it ?

1) stackbase:esp back in (thanks Nidud!)

2) incbin offset bug fixed (reported by Vortex)

3) fixed a bug in coff 32bit name mangle output (reported by JJ)

4) command line switch -nomlib to disable the built-in macro library (if you need or want to)

5) stackbase:rsp and stackbase:rbp code has been totally separated, refactored and optimised.

6) both of these options have been simplified as follows:
   - option stackbase:rsp will enforce win64:11 and frame:auto. All procs with default prologue/epilogue settings will be frame procs.
   - option stackbase:rbp wil enforce frame:auto but leaves option win64:1 -> 7 available. All procs with default prologue/epilogue settings will be frame procs.
   - Both align the first local to 16 when required.
   - RSP already had all the smart optimisations in, which have now been moved to RBP and extended..
   - If no locals or parameters are used at all, the frame pointer will be omitted.
   - If the procedure is a leaf proc (IE: no locals and no further invokes) the reservation of stack via sub/add rsp will be automatically removed.
   - Both options will only copy parameters to home space "IF" they're actually used, so if you just use the source registers you get an optimised proc out.

7) There is now an OSX universal binary package on the site for 2.23
     (This was an absolute pain to get GCC on OSX building it at all and then NOT giving Bus Error 10's).

Enjoy!

PS, We're now implementing the macros for 2.24 - due in store shortly ;)
Title: Re: HJWasm 2.23 Release
Post by: fearless on April 01, 2017, 09:08:42 AM
Nice!

Thanks for all your hard work.
Title: Re: HJWasm 2.23 Release
Post by: jj2007 on April 01, 2017, 09:45:10 AM
Quote from: fearless on April 01, 2017, 09:08:42 AMThanks for all your hard work.

Same from me :t

Builds my larger sources without any problems:
; RichMasm source, 18786 lines:
OxPT_Assembler HJWasm32 ; 1555 ms
OxPT_Assembler JWasm ; 1330 ms
OxPT_Assembler HJWasm64 ; 1274 ms
OPT_Assembler AsmC ; 1085 ms

; MasmBasic source, 31670 lines:
OxPT_Assembler mlv615 ; 8.0 secs
OxPT_Assembler JWasm ; 6.9 secs
OxPT_Assembler HJWasm32 ; 3.7
OxPT_Assembler HJWasm64 ; 3.2
OPT_Assembler AsmC ; 2.6 secs
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 01, 2017, 06:08:11 PM
Quote from: johnsa on April 01, 2017, 08:44:22 AM
- option stackbase:rsp will enforce win64:11 and frame:auto

I am not so sure that stackbase:rsp will enforce win64:11.
I have used:
OPTION casemap:none
OPTION FRAME:auto
OPTION WIN64:6
OPTION STACKBASE:RSP

and it disregarded WIN64:6 and the enforced win64:11 failed to align a local variable to a 16-byte boundary. So a movaps sse instruction from xmm register to local memory caused a crash.
I can not produce right now a proof of concept, will try early next Monday if necessary.
What I can advance is that the procedure uses FRAME and has no USES. I can also say that it does not produce an error with JWASM.
This is not important for me, in my cause STACKBASE:RSP increases the code size and reduces execution speed. I think I mentioned that before.


Title: Re: HJWasm 2.23 Release
Post by: Vortex on April 01, 2017, 08:05:56 PM
Hi Johnsa and Habran,

Thanks for the new release. Keep up the nice work :t
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 02, 2017, 05:06:59 AM
Not too sure how the align local to 16 isn't working, are you calling the function from HLL or is it being run directly from an asm app ?

Here is an example from my side and the locals bob/bob1 are aligned 16 every-time:



option frame:auto
option win64:6
option stackbase:rsp

;assemble with
; c:\jwasm\hjwasm64 -c -win64 -Zi -Zd -Zf -Zp8 aw.asm
; d:\vs2015\vc\bin\link /subsystem:console /machine:x64 /debug /entry:proc1 /Libpath:"%WINSDK%\v7.1\Lib\x64" aw.obj


__m128f struct
f0 real4 ?
f1 real4 ?
f2 real4 ?
f3 real4 ?
__m128f ends

__m128q struct
q0 QWORD ?
q1 QWORD ?
__m128q ends

__m128 union
f32 __m128f <>
q64 __m128q <>
__m128 ends

OPTION ARCH:AVX

    includelib kernel32.lib
    includelib user32.lib

externdef MessageBoxW : near
externdef MessageBoxA : near

MessageBoxW PROTO :qword, :qword, :qword, :qword
MessageBoxA PROTO :qword, :qword, :qword, :qword

.data

; Automatic type promotion from integer to float
aReal REAL4 2

; This is example of initializing a union with floats (first sub-type)
; using normal syntax as well as hjwasm 2.17 update to promote integer literal to float
myVec1 __m128 { < 1.0, 2.0, 3.0, 4.0 > }
myVec2 __m128 { < 1, 2, 3, 4 > }

; Hjwasm 2.22 enhanced union type (now allows direct specification of sub-type to use in initialisation):
myVec4 __m128.f32 { < 1.0, 2.0, 3.0, 4.0 > }   ; you can try .f33 and hjwasm will emit an error when testing for valid sub-type.
myVec3 __m128.q64 { < 0x1234, 0x5678 > }
myVec5 __m128.f32 { < 1.0, 2.0, 3.0, 4.0 > }   ; you can try .f33 and hjwasm will emit an error when testing for valid sub-type.

floatVar real4 2.3

awideStr dw "wide caption ",0

.code

start:

LOADSS xmm0,2.0

OPTION ARCH:SSE

LOADSS xmm1,3.0

OPTION ARCH:AVX

LOADSD xmm2,4.0

;this proc is creating a dud sub rsp,8 :( (FIXED)
proc2 proc public

   ret
proc2 endp

sub1 proc public arg1:ptr, arg2:ptr

   ret
sub1 endp

sub2 proc public uses rdi xmm0 arg1:ptr, arg2:ptr

   ret
sub2 endp

newproc3 proc arg1:qword, arg2:qword

ret
newproc3 endp

newproc proc arg1:qword, arg2:real4

movss xmm3,arg2 ; with option win64:7 , this loads from [rbp+20h] but it SHOULD be [rbp+18h] :(

ret
newproc endp

newproc2 proc FRAME arg1:qword, arg2:real4, arg3:dword, arg4:dword, arg5:dword

movss xmm3,arg2 ; with option win64:7 , this loads from [rbp+20h] but it SHOULD be [rbp+18h] :(
mov eax,arg3
mov ebx,arg4
mov ecx,arg5

ret
newproc2 endp

; This one will implement FPO(frame pointer ommission as no parameters or locals are used).
newproc5 proc FRAME arg1:qword, arg2:real4, arg3:dword, arg4:dword, arg5:dword

xor eax,eax
mov ebx,eax

ret
newproc5 endp

proc1 proc FRAME arg1:qword, arg2:qword, arg3 :qword
   
   local bob:XMMWORD
   local bob1:XMMWORD
   
   mov r9, rcx
   mov r10, rdx
   mov r11, r8

   invoke newproc3, rax, "this is an ascii string"
   movss xmm1, FP4(1.28)
   movss xmm1, FP4(2.28)
   movss xmm1, FP4(3.28)
   
invoke MessageBoxW, 0, ADDR awideStr, ADDR awideStr, 0
invoke MessageBoxA, 0, "yay string literals", "oops", 0

   invoke newproc3, rax,"this is an ascii string"
   invoke newproc3, rcx, L"a wide string yay"
   
    invoke MessageBoxW, 0, L"yay wide string literal", ADDR awideStr, 0
invoke MessageBoxA, 0, "yay string literals2", "oops", 0
invoke MessageBoxA, 0, "yay string literals3", "oops", 0
invoke MessageBoxA, 0, "yay string literals4", "oops", 0

invoke newproc2, rax,xmm4,ebx,r10d,r11d
invoke newproc5, rax,xmm4,ebx,r10d,r11d
   invoke newproc, rax, xmm4
   
   invoke newproc, rax, floatVar
   invoke newproc, rax, xmm1
   
   INVOKE sub1, r10, r8
   INVOKE sub2, r9, r11
   mov rax, r9
   
   vmovaps xmm0,bob
   vmovaps bob1,xmm1

   ret
proc1 endp


WinMainCRTStartup PROC FRAME
invoke proc1, 10, 20, 30
ret
WinMainCRTStartup ENDP

end WinMainCRTStartup


Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 02, 2017, 08:01:59 AM
This is how I get the problem.

#include "stdafx.h"
#if defined (__cplusplus)
extern "C" {
#endif
   void proc1(size_t var1, size_t var2, size_t var3, size_t var4,size_t var5);
#if defined (__cplusplus)
}
#endif

int main()
{
    proc1(1, 2, 3, 4, 5);
    return 0;
}

; test.asm

option casemap:none
option frame:auto
OPTION WIN64:11 ; same error if using OPTION WIN64:6
OPTION STACKBASE:RSP

.code

proc1 proc public FRAME _rcx : qword, _rdx: qword, _r8: qword, _r9 : qword, other: qword
   LOCAL lvar1 : ptr
   LOCAL lvar2 : XMMWORD

   mov eax, 2.0
   movd xmm0, eax
   shufps xmm0, xmm0,0
   movaps XMMWORD ptr lvar2, xmm0
   
   ret
proc1 endp

end

Compiled in release 2.23 with
hjwasm64" -c -win64 -Zp8 -archSSE test.asm

No problems in JWASM, it aligns correctly lvar2
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 02, 2017, 07:02:16 PM
Ahh I see :)

In JWASM the align setting uses 16bytes for every local which is why they're all aligned, we're not , we only align the first local, so if you put something in first thats a qword it will throw the alignment out, but it can save a lot of stack by not wasting the 16 bytes per local, in some cases in my test procs 1-2 whole cache lines.
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 02, 2017, 07:31:53 PM
Quote from: johnsa on April 02, 2017, 07:02:16 PM
Ahh I see :)

In JWASM the align setting uses 16bytes for every local which is why they're all aligned, we're not , we only align the first local, so if you put something in first thats a qword it will throw the alignment out, but it can save a lot of stack by not wasting the 16 bytes per local, in some cases in my test procs 1-2 whole cache lines.

But this is the whole purpose of bit 2, I always understood it that way, of the OPTION WIN64 directive because the first local is always 16 byte aligned when it corresponds to 16 byte variable. It becomes always 16-byte aligned as a result of the required stack alignment that takes place during the prolog.
This is valid for STACKBASE:RSP or STACKBASE:RBP.

Note that with bit 2 of OPTION WIN64, JWASM DOES NOT 16-byte align every local, only 16-byte aligns 16-byte variables.


Title: Re: HJWasm 2.23 Release
Post by: nidud on April 02, 2017, 09:37:47 PM
deleted
Title: Re: HJWasm 2.23 Release
Post by: jj2007 on April 02, 2017, 10:51:23 PM
Quote from: aw27 on April 02, 2017, 07:31:53 PMthe first local is always 16 byte aligned when it corresponds to 16 byte variable.

Interesting observation :t

QuoteNote that with bit 2 of OPTION WIN64, JWASM DOES NOT 16-byte align every local, only 16-byte aligns 16-byte variables.

I agree with John that aligning everything is a waste of stack, but aligning only xmmwords would indeed be an intelligent option. For efficient code, we could still use e.g.
Local x0:XMMWORD, x1, x2, x3, x4
Local y0:XMMWORD, y1, y2, y3, y4
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 03, 2017, 01:51:16 AM
Quote from: jj2007 on April 02, 2017, 10:51:23 PM
Local x0:XMMWORD, x1, x2, x3, x4
Local y0:XMMWORD, y1, y2, y3, y4


This is a trap, isn't it?
I think you will not get what you want, if I know what you want.
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 03, 2017, 04:46:06 AM
I've fixed this now.. but for option stackbase:rsp it means you need the value 15.

If you supply any value < 11, it becomes 11 (as this is the most sensible)... except if you specify 15 which it will accept as 15 = 11 + align16 bit set.
So you have 2 options with stackbase:rsp  ... 11 or 15 and that's it.

For stackbase rbp you can use any values between 0-7 as per normal.

This will be in 2.24 shortly along with some new presents :)
Title: Re: HJWasm 2.23 Release
Post by: jj2007 on April 03, 2017, 06:38:57 AM
Quote from: aw27 on April 03, 2017, 01:51:16 AMThis is a trap, isn't it?

No, it's quite real. Just efficient code: you start with a local that can be used with movaps and friends, you add exactly 4 dwords, then the next SSE variable, etc. Hand-crafted but it works, of course.

include \Masm32\MasmBasic\Res\JBasic.inc      ; ## builds in 32- or 64-bit mode with ML, AsmC, JWasm, HJWasm ##
usedeb=1

.code
MyTest proc <cb> uses rsi rdi rbx arg1, arg2, arg3, arg4, arg5
Local x0:XMMWORD, x1, x2, x3, x4
Local y0:XMMWORD, y1, y2, y3, y4
  lea rax, x0
  lea rdx, y0
  deb 4, "are x0+y0 aligned?", x:rax, x:rdx, x:rbp, x:rsp, arg1, arg2, arg3, arg4, arg5
  ret
MyTest endp

  Init
  PrintLine Chr$("This code was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format", 13, 10)
  jinvoke MyTest, 111, 222, 333, 444, 555
  Inkey
EndOfCode


Output:This code was assembled with HJWasm32 in 64-bit format

are x0+y0 aligned?
x:rax   12feb0h
x:rdx   12fe90h
x:rbp   12fef0h
x:rsp   12fe00h
arg1    111
arg2    222
arg3    333
arg4    444
arg5    555
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 03, 2017, 04:11:54 PM
Quote from: jj2007 on April 03, 2017, 06:38:57 AM
No, it's quite real. Just efficient code: you start with a local that can be used with movaps and friends, you add exactly 4 dwords, then the next SSE variable, etc. Hand-crafted but it works, of course.

My take on this is that we shall place the variables in descending order by powers 2 of the TYPE of variable.
For example:
LOCAL a : XMMWORD
LOCAL b : XMMWORD
LOCAL c : QWORD
LOCAL d : QWORD
LOCAL e[10] : DWORD
LOCAL f : DWORD
LOCAL g[5] : WORD
LOCAL h[20] : BYTE
LOCAL i : BYTE

I believe this will pack as much as possible. On the other hand, I don't think that, in general, will cause any problem whatsoever to place variables out of order increasing a little the memory consumption when the compiler helps us to maintain the alignment, that's why I like BIT 2 of OPTION WIN64 when there are XMMWORDs. May be the compiler should consider other alignments for stack variables, such as 32, 64, 128 bytes, 256 or 512 bytes. I am adding that to my Wish List.
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 03, 2017, 04:17:46 PM
Quote from: johnsa on April 03, 2017, 04:46:06 AM
I've fixed this now.. but for option stackbase:rsp it means you need the value 15.

If you supply any value < 11, it becomes 11 (as this is the most sensible)... except if you specify 15 which it will accept as 15 = 11 + align16 bit set.
So you have 2 options with stackbase:rsp  ... 11 or 15 and that's it.

For stackbase rbp you can use any values between 0-7 as per normal.

This will be in 2.24 shortly along with some new presents :)

Looking forward to it.  :t
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 03, 2017, 05:45:26 PM
The change I've made for 2.24 is to just take any local greater than a qword (in size) and align it's position up to 16. This happens in order, I don't want to shuffle the local order around in the assembler as it's not transparent to the user, which might in some case lead to undesirable outcomes, if for example someone writes code that assumes local B occurs after local A and they can be grabbed simultaneously..



MyProc PROC FRAME ....
  LOCAL a:XMMWORD            ; This is aligned 16 in all cases anyway..
  LOCAL b:QWORD
  LOCAL c:XMMWORD            ; This is now aligned to 16 with win64:15
  LOCAL d:DWORD
  LOCAL e:DWORD

  lea rax,e
  mov rax,[rax]              ; something like this where it might want to load in D and E at the same time... not that I'd recommend this.. but you never know!



So the alignment for win64:15 now will just insert the necessary padding in front of a local where it's required.
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 03, 2017, 07:11:56 PM
Quote from: johnsa on April 03, 2017, 05:45:26 PM
The change I've made for 2.24 is to just take any local greater than a qword (in size) and align it's position up to 16. This happens in order, I don't want to shuffle the local order around in the assembler as it's not transparent to the user, which might in some case lead to undesirable outcomes, if for example someone writes code that assumes local B occurs after local A and they can be grabbed simultaneously..



MyProc PROC FRAME ....
  LOCAL a:XMMWORD            ; This is aligned 16 in all cases anyway..
  LOCAL b:QWORD
  LOCAL c:XMMWORD            ; This is now aligned to 16 with win64:15
  LOCAL d:DWORD
  LOCAL e:DWORD

  lea rax,e
  mov rax,[rax]              ; something like this where it might want to load in D and E at the same time... not that I'd recommend this.. but you never know!



So the alignment for win64:15 now will just insert the necessary padding in front of a local where it's required.

It should work even for structure and union variables, independently of the alignment set for their fields.
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 03, 2017, 08:28:29 PM
If the structure or union is >= 16 bytes it will, I can adjust this so that it applies it to ANY struct/union, but logically if the structure isn't at least 16 bytes in total size you wouldn't be using any aligned operations against it anyway ?
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 03, 2017, 08:44:12 PM
Quote from: johnsa on April 03, 2017, 08:28:29 PM
If the structure or union is >= 16 bytes it will, I can adjust this so that it applies it to ANY struct/union, but logically if the structure isn't at least 16 bytes in total size you wouldn't be using any aligned operations against it anyway ?

I don't think you should and users are expected to know that they should align the structure fields to 16 if the structure contains a XMMWORD, because the structure alignment on the stack to 16 bytes does not guarantee that some XMMWORD field inside the structure will be aligned to 16 bytes.
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 03, 2017, 09:14:18 PM
Yep, well that's how it will be then in 2.24 as currently fixed, only the start address of the structure is guaranteed to be 16, not the fields inside it.
Title: Re: HJWasm 2.23 Release
Post by: powershadow on April 03, 2017, 11:59:35 PM
Hi johnsa.
I am not familiar with asm x64. This is my first program, so I don't know it's my mistake or hjwasm bug or maybe it fixed in 2.23 version, because i test all in 2.22 version.

Simple source:

.686P

.x64

option casemap :none
option win64 : 11
option frame : auto
option stackbase : rsp

include WINDOWS.INC

includelib user32.lib
includelib Kernel32.Lib

.data
Text db '0123456789',0

.code

ShowMessage proc  FRAME
LOCAL TxtBuff[11]:byte
LOCAL Flag:BOOL

invoke ZeroMemory,addr TxtBuff,sizeof(TxtBuff)
invoke lstrcpy,addr TxtBuff,addr Text

invoke MessageBox,0,addr TxtBuff,"Info",MB_OK

; ---------------------------
; "Info" < Why double quote added to the text ???
; ---------------------------
; 0123456789 < Text is ok.
; ---------------------------
; OK   
; ---------------------------

mov Flag,TRUE

invoke MessageBox,0,addr TxtBuff,"Info",MB_OK

; ---------------------------
; "Info" < Why double quote added to the text ???
; ---------------------------
; 01234567 < "89" was rewritten by "mov Flag,TRUE"
; ---------------------------
; OK   
; ---------------------------

ret

ShowMessage endp


start proc FRAME

invoke ShowMessage
invoke ExitProcess,0

start endp

end start


Besides, could you explain when I need to add "FRAME" after the "proc"?  I can't find any info about it.

Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 04, 2017, 12:42:25 AM
Amazingly.. I think you have found a bug, i'm working on it now !!!

So you can leave out .686P (you don't need that)

FRAME is now implied, so you don't have to specify it at all..
Basically under Win64 , procedures should have a pdata / xdata entry which allows for proper exception handling.. originally jwasm used frame:auto and FRAME on the PROC declaration to specify this behaviour, but along the lines it also got roped into how the prologue/epilogue is generated .. by inserting .PUSHREG and other exception frame related operations.. bottom line is, there is no reason not to have it, and to keep the code consistent we've forced it on for all procs.

So you can specify it, or not.. it doesn't matter anymore from 2.23

A PROC is a PROC is a PROC. :)
Title: Re: HJWasm 2.23 Release
Post by: aw27 on April 04, 2017, 03:57:49 AM
Quote from: johnsa on April 04, 2017, 12:42:25 AM
Amazingly.. I think you have found a bug, i'm working on it now !!!
The bug happens as well with option stackbase : rbp, unless you use option win64 : 2 which happens to fix the alignment issue.
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 04, 2017, 04:25:39 AM
This is fixed now too for both rsp and rbp.

I'm just waiting for confirmation from Habran on some of his changes and as soon as he's ready we will put 2.24 up.
Title: Re: HJWasm 2.23 Release
Post by: johnsa on April 04, 2017, 06:42:23 PM
I've put 2.24 up.

The fixes are:
stackbase:rsp overwriting locals in some cases. (powershadow's bug #1).
stackbase:rbp enforce stack allocations so setting bit 2 of win64 isn't required to get alignment working (aw27's bug).
invoke string literals no longer include the quotes " "s (powershadows bug #2).

stackbase:rsp with win64:15 supports LOCAL alignment to 32 now (if you want to use aligned YMMWORDS or great).
Any structure or union .. in fact any local that is >= 16 bytes in size will be aligned to 16 under both rsp and rbp.

SYSTEM V ABI calls are in now too (this is early days experimental, and too serve as a test-case for adding delphi calls next). It also helps move towards the goal of full OSX and Linux support, now that we have an OSX build of HJWASM too.
To use win64 flags still apply (will be aliased in future), stackbase:rbp must be used and then a proc is simply decorated with:



OPTION ARCH:SSE
nixproc PROC SYSTEMV USES rbx xmm0 arg1:qword, arg2:DWORD, arg3:REAL4
LOCAL mem:DWORD
LOCAL vec:XMMWORD
mov rbx,arg1
mov ecx,arg2
mov eax,10
mov mem,eax
mov ecx,mem
IF @Arch EQ 0
movss xmm10,arg3
movaps xmm0,vec
ELSE
vmovss xmm10,arg3
vmovaps xmm0,vec
ENDIF
ret
nixproc ENDP
OPTION ARCH:AVX