News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

UASM 2.35 Relase

Started by johnsa, June 03, 2017, 03:31:49 AM

Previous topic - Next topic

johnsa

UASM 2.35 is available for download.

Changes (and there are a lot!):

1) Fixed a memory leak in symbol table.

2) Fixed UASM32 crash with listing output , as per other thread details.

3) Improved several SystemV prologue and epilogue generation options.

4) Further refactoring and optimisation of all code-paths for the handling of prologue and epilogue.

5) Optimisation of UASM32 executable and changed compiler+settings.

6) We've implemented a completely new handling for movq and vmovq. It has never worked properly since Jwasm, it appears that due to the various operands and addressing modes the standard instruction tables can't deal with it properly so we've cased this one specially to handle all the various options, test piece:



vmovq xmm0,xmm1
movq xmm0,xmm1

movd xmm0,ecx
vmovd xmm0,ecx
movq xmm0,rcx

mov eax,1.0
vmovd xmm0,eax
vpshufd xmm0,xmm0,0

mov eax,2.0
vmovd xmm1,eax
vpshufd xmm1,xmm1,0

vmovq xmm0,rcx

movq xmm0, qword ptr [rcx]
vmovq xmm0, qword ptr [rcx]

movq qword ptr [rcx], xmm0
vmovq qword ptr [rcx], xmm0

movd xmm0, dword ptr [rcx]
vmovd xmm0, dword ptr [rcx]

vmovd [rsi+28], xmm8
vmovq [rsi+28], xmm9
vmovd dword ptr [rsi+28], xmm11
vmovq qword ptr [rsi+28], xmm7



Interestingly, the Intel manuals don't list a MOVQ or VMOVQ xmmN, xmmN mode, but the AMD manuals do, and it works as you would expect moving the lower qword and zero'ing the upper bits.

7) String literals have be brought back in, but to ensure compatibility they must be enabled with OPTION LITERALS:ON (this can be turned off again as required).
Further to this, string literals will only be implemented in an invoke IFF the corresponding procedure argument type is :PTR (this conform with WinInc and LPSTR etc).
This allows a procedure to work as follows:

MyProc PROTO :DWORD, :BYTE, :PTR

invoke MyProc, "abcd", 'a', "This is a literal string"
and the same applies for wide character string literals with L"".

8) PROTOS and PROCS can now have a return type specified (if omitted the default for OS/ABI + wordsize is selected). This is totally backwards compatible and has no impact on existing code, however it opens up some useful new options:

8.1) There is a new built-in variable now @LastReturnType, this is set each time an INVOKE happens to track the return type.
8.2) This has now also allowed the creation of a new macro in the macro library UINVOKE which can automatically determine and utilize the correct EXITM based on the return type:


        vcmpss xmm0, UINVOKE(myfuncR4, xmm2), xmm1, 0
mov eax, uinvoke(myfuncD, xmm2)
cmp rbx, uinvoke(myfuncQ, xmm3)
vmovaps xmm0, uinvoke(myfuncX, xmm2)


8.3) Return types are specified as follows:



myfuncR4 PROTO REAL4 :REAL4
myfuncD  PROTO DWORD :REAL4

myfuncR4 PROC REAL4 FRAME a:REAL4
xor rax,rax
vmovss xmm0,a
ret
myfuncR4 ENDP

myfuncD PROC DWORD FRAME a:REAL4
xor rax,rax
invoke myfuncQ, xmm0
ret
myfuncD ENDP



The return type can be any supported data-type that would normally be applicable for a return type under the ABI (BYTE, SBYTE, WORD, SWORD, DWORD, SDWORD, QWORD, SQWORD, PTR, REAL4, REAL8, __M128, __m256, __m512, xmmword, oword, ymmword, zmmword ).

The type must be specified on the proto immediately before the parameters but without a name or colon.
On the PROC it should be declared after the language specifier, before FRAME, USES.

9) Several new functions have been added to the macro library:
ASFLOAT, ASDOUBLE, R4P, R8P which all generate something equivalent to: EXITM <REAL4 PTR reg> and this provides additional functionality for point 10.

10) HLL directives now fully support floating point comparisons, examples:



.if(xmm0 < FP4(1.5))
xor eax,eax
.endif
.if(xmm0 > FP4(2))
xor eax,eax
.endif

.if(xmm0 < floatvar1)
xor eax,eax
.endif
.if(xmm0 == floatvar2)
xor eax,eax
.endif

.if(xmm0 < [rdx])
xor eax,eax
.endif
.if(xmm0 == [rdx+rbx])
xor eax,eax
.endif

.if(xmm0 < xmm1)
xor eax,eax
.endif
.if(xmm0 > xmm1)
xor eax,eax
.endif
.if(xmm0 <= xmm1)
xor eax,eax
.endif
.if(xmm0 == xmm3)
xor eax,eax
.endif

.if(real4 ptr xmm0 < xmm1)
xor eax,eax
.endif
.if(xmm0 < real4 ptr xmm1)
xor eax,eax
.endif
.if(asfloat(xmm0) < xmm1)
xor eax,eax
.endif

LOADSD xmm0, 1.0
LOADSD xmm1, 2.0
LOADSD xmm3, 1.0

.if(xmm0 < doublevar1)
xor eax,eax
.endif
.if(xmm0 == doublevar2)
xor eax,eax
.endif
.if(real8 ptr xmm0 < FP8(1.5))
xor eax,eax
.endif
.if(xmm0 < real8 ptr FP8(1.5))
xor eax,eax
.endif
.if(real8 ptr xmm0 < xmm1)
xor eax,eax
.endif
.if(real8 ptr xmm0 > real8 ptr xmm1)
xor eax,eax
.endif
.if(real8 ptr xmm0 <= xmm1)
xor eax,eax
.endif
.if(xmm0 == real8 ptr xmm3)
xor eax,eax
.endif
.if(xmm0 == asdouble(xmm3))
xor eax,eax
.elseif(xmm0 > asdouble(xmm2))
xor eax,eax
.endif

LOADSS xmm0,1.0
LOADSS xmm1,2.0

.while(xmm0 < xmm1)
vaddss xmm0,xmm0,FP4(0.1)
.endw



Cheers!
John


aw27

I had a problem with 32-bit sources.
uasm64 -coff -zt0 dmath32.asm

UASM v2.35, Jun  2 2017, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

DMath32.asm(5543) : Error A2049: Invalid instruction operands
DMath32.asm(8726) : Error A2049: Invalid instruction operands
DMath32.asm(8734) : Error A2049: Invalid instruction operands
DMath32.asm(8830) : Error A2049: Invalid instruction operands
DMath32.asm(8838) : Error A2049: Invalid instruction operands
DMath32.asm(9257) : Error A2049: Invalid instruction operands
DMath32.asm(13122) : Error A2049: Invalid instruction operands
DMath32.asm: 32062 lines, 1 passes, 65 ms, 0 warnings, 7 errors
UASM v2.35, Jun  2 2017, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

.const
align 16
_g_anyconst dd 1.0, 2.0, 3.0, 4.0 ; or any other constant

.code
orps xmm0, _g_anyconst
cmpeqps xmm2, _g_anyconst
divps xmm0, _g_anyconst
andnps xmm2, _g_anyconst

rsala

I must say that next version of Easy Code 64-bit (probably coming in a pair of days) will be compiled with this new release of UASM64.

Good work John!
:t
EC coder

johnsa

Quote from: aw27 on June 03, 2017, 04:00:31 AM
I had a problem with 32-bit sources.
uasm64 -coff -zt0 dmath32.asm

UASM v2.35, Jun  2 2017, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

DMath32.asm(5543) : Error A2049: Invalid instruction operands
DMath32.asm(8726) : Error A2049: Invalid instruction operands
DMath32.asm(8734) : Error A2049: Invalid instruction operands
DMath32.asm(8830) : Error A2049: Invalid instruction operands
DMath32.asm(8838) : Error A2049: Invalid instruction operands
DMath32.asm(9257) : Error A2049: Invalid instruction operands
DMath32.asm(13122) : Error A2049: Invalid instruction operands
DMath32.asm: 32062 lines, 1 passes, 65 ms, 0 warnings, 7 errors
UASM v2.35, Jun  2 2017, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

.const
align 16
_g_anyconst dd 1.0, 2.0, 3.0, 4.0 ; or any other constant

.code
orps xmm0, _g_anyconst
cmpeqps xmm2, _g_anyconst
divps xmm0, _g_anyconst
andnps xmm2, _g_anyconst

I've checked hjwasm and original jwasm and they're the same, it's type checked so you'd need to specify xmmword ptr : eg:
orps xmm0, xmmword ptr _g_anyconst (or change the data type )

aw27

Quote from: johnsa on June 03, 2017, 05:40:20 AM
I've checked hjwasm and original jwasm and they're the same, it's type checked so you'd need to specify xmmword ptr : eg:
orps xmm0, xmmword ptr _g_anyconst (or change the data type )

MASM does not produce errors.  :P
But we have a problem of inconsistency here, lots of similar instructions do not produce the error in UASM but produce in Jwasm.
For example:
   addps xmm7,Some_Constant
   mulps xmm7, Some_Constant
        movaps xmm0,  Some_Constant




johnsa

I tend to agree, there is a specific piece of logic, that when using -Zm switch (masm compatibility) there are select set of sse instructions that we automatically convert.
This was based on request.

So without -Zm, they'll all give a type-checked error, with that switch on this group will be automatically promoted to assume xmmword ptr:

CodeInfo.token == T_SUBPD) ||(CodeInfo.token == T_SUBPS) || (CodeInfo.token == T_ADDPS) ||  (CodeInfo.token == T_ADDPD) || (CodeInfo.token == T_MULPD) || (CodeInfo.token == T_MULPS) || (CodeInfo.token == T_ANDPD) || (CodeInfo.token == T_ANDPS) || (CodeInfo.token == T_MOVAPD) || (CodeInfo.token == T_MOVAPS) || (CodeInfo.token == T_MOVUPS))

So for consistency, without -Zm is right, with -Zm we could just promote all sse instructions rather than just a select few, I'm open to that idea, it was only really ever added as a user request. :)
Will make a note to look into it.

jj2007

IMO Uasm is right to complain, as movups loads an OWORD - there is no other encoding afaik.

However, when testing it with an OWORD, it assembles but I don't really like what I see in Olly ::)

include \masm32\include\masm32rt.inc
.686p
.xmm
.const
align 16
_g_anyconst OWORD 11111111.0, 22222222.0, 33333333.0, 44444444.0 ; or any other constant

.code
start:
int 3
movaps xmm0, _g_anyconst
exit
end start

habran

I ma not really clear what is a problem here :icon_eek:
this is the code I have assembled with both uasm32 and uasm64 and they assemble without complain and produce a proper output:

.686p
.xmm
.model flat, stdcall
option casemap:none
.const
align 16
Some_Constant dd 1.0, 2.0, 3.0, 4.0

.code
start:
addps xmm0, Some_Constant
ret
end start
output:
--- w32test.asm ----------------------------------------------------------------
     1: .686p
     2: .xmm
     3: .model flat, stdcall
     4: option casemap:none
     5: .const
     6: align 16
     7: Some_Constant dd 1.0, 2.0, 3.0, 4.0
     8:
     9: .code
    10: start:
    11: addps xmm0, Some_Constant
00F51000 0F 58 05 40 30 F5 00 addps       xmm0,xmmword ptr ds:[0F53040h] 
    12: ret                               ;XMM0 = 4080000040400000400000003F800000
00F51007 C3                   ret 
--- No source file -------------------------------------------------------------

Cod-Father

jj2007

Check what the movaps xmm0, _g_anyconst loads after the int 3 in my example (expected value is 11111111.0).

habran

I am not into floating points very much, however, the way you presented them doesn't look right to me :dazzled:
It looks like if you use something like that:
Some_Constant dd 1.11111111, 2.22222222, 3.33333333, 4.44444444

we get the proper interpretation in register:
XMM0 = 408E38E440555555400E38E43F8E38E4

XMM00 = +1.11111E+000      XMM01 = +2.22222E+000      XMM02 = +3.33333E+000      XMM03 = +4.44444E+000     

I hope rrr314159 could shed some light on this stuff :biggrin:
Cod-Father

aw27

#10
Quote from: jj2007 on June 03, 2017, 09:06:43 AM
IMO Uasm is right to complain, as movups loads an OWORD - there is no other encoding afaik.
It is not right to complain, movups = Move Unaligned Packed Single-Precision Floating-Point Values
It expects 4 floating point values.  :P
It can also swallow a oword/xmmword, but what you expect may not be what you get, as you noticed. MASM does not accepts OWORDS, OWORDS are not XMMWORDS although they appear to be!
UASM accepts
Some_Constant XMMWORD 3f800000400000004040000040800000h
as synonymous for
Some_Constant dd 1.0, 2.0, 3.0, 4.0

I prefer the 2nd declaration,

aw27

#11
Quote from: johnsa on June 03, 2017, 07:54:23 AM
I tend to agree, there is a specific piece of logic, that when using -Zm switch (masm compatibility) there are select set of sse instructions that we automatically convert.
This was based on request.

So without -Zm, they'll all give a type-checked error, with that switch on this group will be automatically promoted to assume xmmword ptr:

CodeInfo.token == T_SUBPD) ||(CodeInfo.token == T_SUBPS) || (CodeInfo.token == T_ADDPS) ||  (CodeInfo.token == T_ADDPD) || (CodeInfo.token == T_MULPD) || (CodeInfo.token == T_MULPS) || (CodeInfo.token == T_ANDPD) || (CodeInfo.token == T_ANDPS) || (CodeInfo.token == T_MOVAPD) || (CodeInfo.token == T_MOVAPS) || (CodeInfo.token == T_MOVUPS))

So for consistency, without -Zm is right, with -Zm we could just promote all sse instructions rather than just a select few, I'm open to that idea, it was only really ever added as a user request. :)
Will make a note to look into it.

I don't see any logic because there is no possible confusion, the instructions can only deal with 4 floating point values. No need to tell the assembler "hey, you are going to receive 4 floating point values".
BTW, when assembling 64-bit this problem does not arise.

This is different from
myDword dd 11111111h
mov byte ptr myDword
The variable is declared as a dword, so the assembler helps you to track a possible error.

But there is a serious bug (in my opinion):
If you declare
Some_Constant db 1.0, 2.0, 3.0, 4.0
addps xmm0, Some_Constant
does not produce an error in UASM, it does in MASM.


johnsa

Quote from: jj2007 on June 03, 2017, 09:06:43 AM
IMO Uasm is right to complain, as movups loads an OWORD - there is no other encoding afaik.

However, when testing it with an OWORD, it assembles but I don't really like what I see in Olly ::)

include \masm32\include\masm32rt.inc
.686p
.xmm
.const
align 16
_g_anyconst OWORD 11111111.0, 22222222.0, 33333333.0, 44444444.0 ; or any other constant

.code
start:
int 3
movaps xmm0, _g_anyconst
exit
end start


OWORD = 16byte, so that would be declaring (effectively) an array of 4 16byte(OWORD) values, for which a single float initializer isn't going to make any sense hence it just producing zero.
ML and UASM will both allow you to declare this, but the result is garbage in both cases.

imho, for declaring simd data types there are several approach in order of preference:

1) MyVector __m128 <>    ; uasm only, but offers the best possible experience as you can use them exactly like C
    MyVector __m128.i32  { <1,2,3,4> }
    MyVector __m128.d64 ... and so on, and in the debugger you get a proper structure with sub elements of all the types, so you can easily interpret the register contents when working with floating point, integer, byte etc.

2) Use a list of REAL4, REAL8,  dq, dd
   MyVector REAL4 1.0, 2.0, 3.0, 4.0 
   MyVector dd 1.0, 2.0, 3.0, 4.0

3) Use an XMMWORD, YMMWORD or ZMMWORD data type.

johnsa

Quote from: aw27 on June 03, 2017, 02:36:55 PM
But there is a serious bug (in my opinion):
If you declare
Some_Constant db 1.0, 2.0, 3.0, 4.0
addps xmm0, Some_Constant
does not produce an error in UASM, it does in MASM.

I think it's all a bit hit and miss, for example the following will produce an error in ML:


somelist db 1,2,3,4
addps xmm0,somelist


this will work


somelist dd 1,2,3,4
addps xmm0,somelist


this however ALSO works (which is also potential wrong/dangerous):


somelist dd 1
addps xmm0,somelist


It also depends on the masm version, older versions accept everything in every form without any error.

This doesn't work in masm or uasm without an override:


somelist db 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4
movdqa xmm0,somelist


and that is something I use often, and is no less correct or more incorrect than any of the other examples either.

So it would appear that all ML does is check the element size, if it's a dd/real4 it just allows it, regardless of whether the variable is actually 16 bytes in size, and thusly wouldn't allow dw or db.

My thinking here is, because neither work in the most predictable way, we can work out the size of the variable, so in all these cases
dd 1,2,3,4
dd 1.0, 2.0, 3.0, 4.0
db 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4
real4 1,2,3,4

and so on.. the size is 16, so we could use that exclusively as the deciding factor on whether or not a type override is required.
That would allow you to omit it in almost all cases.

aw27

Quote from: johnsa on June 03, 2017, 06:01:17 PM
My thinking here is, because neither work in the most predictable way, we can work out the size of the variable, so in all these cases
dd 1,2,3,4
dd 1.0, 2.0, 3.0, 4.0
db 1,2,3,4, 1,2,3,4, 1,2,3,4, 1,2,3,4
real4 1,2,3,4

and so on.. the size is 16, so we could use that exclusively as the deciding factor on whether or not a type override is required.
That would allow you to omit it in almost all cases.
Looking at the size looks good.  :t