News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

HJWASM 2.17 bugs

Started by powershadow, December 29, 2016, 03:53:22 AM

Previous topic - Next topic

powershadow

Hi Branislav & John. You are doing great good job for all asm community. I really like hjwasm, but I found several bugs (tested on hjwasm32.exe only).

Bug#1: Incorrect "while" parsing.

Simple source:
start:
xor edx,edx
mov ecx,1
.while (edx ==0 || edx ==1) && (SDWORD ptr ecx >=0)
inc edx
dec ecx
.endw
.if edx == 2
invoke MessageBox,0,0,0,MB_OK
.else
invoke MessageBox,0,0,0,MB_ICONERROR
.endif
invoke ExitProcess,0

end start


ML.exe (6.14.8444) code:

start:
xor edx, edx
mov ecx, 1
jmp While_check
do_while:
inc edx
dec ecx
While_check:
or edx, edx
jz While_check2
cmp edx, 1
jnz jmp_from_while
While_check2:
cmp ecx, 0
jge do_while
jmp_from_while:


HJWasm v2.17, Dec  5 2016 code:

start:
xor edx, edx
mov ecx, 1
jmp While_check
do_while:
inc edx
dec ecx
While_check:
cmp edx, 0
jz jmp_from_while
cmp edx, 1
jnz jmp_from_while
cmp ecx, 0
jge do_while
jmp_from_while:


Bug#2: For Win64 in many cases missing error: "Register value overwritten by INVOKE"  for registers: rcx(ecx) & rdx(edx)

Simple case:
TestProc1 proc Par1:QWORD,Par2:QWORD,Par3:QWORD,Par4:QWORD,Par5:QWORD

ret

TestProc1 endp

start:
mov rcx,555h
mov rdx,666h
invoke TestProc1,111h,rcx,rdx,0,0
end start

40001005:   mov rcx, 0x555
4000100C:   mov rdx, 0x666
40001013:   mov rcx, 0x111
4000101A:   mov rdx, rcx
4000101D:   mov r8, rdx
40001020:   xor r9d, r9d
40001023:   mov qword [rsp+0x20], 0x0
4000102C:   call 0x40001000


Bug#3 Creating stack error (HJWasm v2.17, Dec  9 2016 bug) , HJWasm v2.13 - works fine.
Simple source:

func proc
LOCAL Buff[128]:byte

xor eax,eax
ret

func endp

55 PUSH EBP
8BEC MOV EBP,ESP
81C4 7FFFFFFF ADD ESP,-81 ;???????????
33C0 XOR EAX,EAX
8BE5 MOV ESP,EBP
5D POP EBP
C3 RET



Bug#4: Completely broken source mode debugging since HJWasm v2.13, Feb 13 2016.
In some of my projects converted from masm to hjwasm, /Zd parameter not generating line number debug information. In code section "number of lines number" = 1 and LINENUMBERS information present only for the first line.

HJWasm v2.13, Feb 13 2016    - source debugging works perfectly.
HJWasm v2.15r2, Oct  1 2016   - source debugging not working.
HJWasm v2.15, Sep  6 2016   - source debugging not working.
HJWasm v2.17, Dec  9 2016    - source debugging not working.

After all of this, HJWasm v2.13 is more stable(bug#3 and Bug#4 not present).

jj2007

Quote from: powershadow on December 29, 2016, 03:53:22 AM
Hi Branislav & John. You are doing great good job for all asm community.

Indeed :t

p1_s: xor edx,edx
mov ecx,1
.while (edx ==0 || edx ==1) && (SDWORD ptr ecx >=0)
inc edx
dec ecx
.endw
p1_endp: nop

deb 4, "Res", edx, ecx
CodeSize p1

p2_s: xor edx,edx
mov ecx,1
jmp @test

@more: inc edx
dec ecx
@test: test edx, edx
je @ecx
cmp edx, 1
jne @out
@ecx: test ecx, ecx
jns @more
p2_endp: nop

@out: deb 4, "Res", edx, ecx
CodeSize p2


Correct output (ML 6.15):Res
edx             2
ecx             -1
25      bytes for p1

Res
edx             2
ecx             -1
24      bytes for p2


Under the hood:HJWasm32 2.17 & AsmC
00401083                         ³.  CC                      int3
00401084                         ³. EB 02                   jmp short 00401088
00401086                         ³>  42                      Úinc edx
00401087                         ³.  49                      ³dec ecx
00401088                         ³>  83FA 00                 +cmp edx, 0
0040108B                         ³. 74 0A                   ³je short 00401097
0040108D                         ³.  83FA 01                 ³cmp edx, 1
00401090                         ³. 75 05                   ³jne short 00401097
00401092                         ³.  83F9 00                 ³cmp ecx, 0
00401095                         ³. 7D EF                   Àjge short 00401086
00401097                         ³>  90                      nop

ML 6.15
00401083                         ³.  CC                      int3
00401084                         ³. EB 02                   jmp short 00401088
00401086                         ³>  42                      Úinc edx
00401087                         ³.  49                      ³dec ecx
00401088                         ³>  0BD2                    +or edx, edx
0040108A                         ³. 74 05                   ³jz short 00401091
0040108C                         ³.  83FA 01                 ³cmp edx, 1
0040108F                         ³. 75 05                   ³jne short 00401096
00401091                         ³>  83F9 00                 ³cmp ecx, 0
00401094                         ³. 7D F0                   Àjge short 00401086
00401096                         ³>  90                      nop


Since AsmC has the same problem, it seems a WatCom issue...
Check the handling of brackets; this produces correct results:
   .while edx ==0 || edx ==1 && SDWORD ptr ecx >=0

In any case, SDWORD ptr ecx >=0 could be translated to the shorter test + jns. ML code is already one byte shorter because of or edx, edx instead of cmp edx, 0; but to be honest, the coder can force the shorter version with
.while (!edx || edx ==1) && SDWORD ptr ecx >=0

nidud

#2
deleted

habran

That problem is fixed in next releise 2.18 and will be out as soon as Johnsa get the time to upload it, we had to add an additional label.
We have also changed 'cmp edx, 0' to 'test edx,edx' to reduce it to 2 bytes instead of 3, we have been tossing between OR and TEST but decided that TEST is more appropriate in that case:

start:   
    xor edx,edx
00C21000 33 D2                xor         edx,edx 
00C21002 EB 02                jmp         $$$00001+6h (0C21006h) 
@C0001:
00C21004 42                   inc         edx 
00C21005 49                   dec         ecx 
@C0004:
00C21006 85 D2                test        edx,edx 
00C21008 74 05                je          $$$00001+0Fh (0C2100Fh) 
00C2100A 83 FA 01             cmp         edx,1 
00C2100D 75 05                jne         $$$00001+14h (0C21014h) 
@C0002:
00C2100F 83 F9 00             cmp         ecx,0 
00C21012 7D F0                jge         $$$00001+4h (0C21004h) 
@C0003:
00C21014 83 FA 03             cmp         edx,3 
00C21017 75 0F                jne         $$$00001+28h (0C21028h) 
Cod-Father

habran

To make code more optimised in this case:
.while (edx ==0 || edx ==1) && (SDWORD ptr ecx >=0)

it should be better to write:
.while   (SDWORD ptr ecx >=0) && (edx ==0 || edx ==1)
Quote
start:   
    xor edx,edx
00D81000 33 D2                xor         edx,edx 
00D81002 EB 02                jmp         $$$00001+6h (0D81006h) 
@C0001:
00D81004 42                   inc         edx 
00D81005 49                   dec         ecx 
@C0004:
00D81006 83 F9 00             cmp         ecx,0 
00D81009 7C 09                jl          $$$00001+14h (0D81014h) 
@C0002:
00D8100B 85 D2                test        edx,edx 
00D8100D 74 F5                je          $$$00001+4h (0D81004h) 
00D8100F 83 FA 01             cmp         edx,1 
00D81012 74 F0                je          $$$00001+4h (0D81004h) 
@C0003:
00D81014 83 FA 03             cmp         edx,3 
00D81017 75 0F                jne         $$$00001+28h (0D81028h)
Cod-Father

habran

Fix is in hll.c
Quote
static ret_code GetAndExpression(struct hll_item *hll, int *i, struct asm_tok tokenarray[], int ilabel, bool is_true, char *buffer, struct hll_opnd *hllop)
/***********************************************************************************************************************************************************/
{
  char *ptr = buffer;
  uint_32 truelabel = 0;
  uint_32 truelabel2 = 0;  // added HJWasm 2.18
  //char buff[16];
  //char *nlabel;
  //char *olabel;

  DebugMsg1(("%u GetAndExpression(>%.32s< buf=>%s<) enter\n", evallvl, tokenarray[*i].tokpos, buffer));

  if (ERROR == GetSimpleExpression(hll, i, tokenarray, ilabel, is_true, ptr, hllop))
    return(ERROR);
  while (COP_AND == GetCOp(&tokenarray[*i])) {
    (*i)++;
    DebugMsg1(("%u GetAndExpression: &&-operator found, is_true=%u, lastjmp=%s\n", evallvl, is_true, hllop->lastjmp ? hllop->lastjmp : "NULL"));
    if (is_true) {
      /* todo: please describe what's done here and why! */
      if (hllop->lastjmp) {
        char *p = hllop->lastjmp;
        InvertJump(p);          /* step 1 */
        if (truelabel == 0){     /* step 2 */
          if (tokenarray[*i-2].token == T_CL_BRACKET) // if '&&' is between closed bracket like ')&&'
          truelabel2 = GetHllLabel();                 // init truelabel2 for second test
          truelabel = GetHllLabel();
          }
        if (*p ) {              /* v2.11: there might be a 0 at lastjmp */
          p += 4;               /* skip 'jcc ' or 'jmp ' */
          if (truelabel > 0) GetLabelStr(truelabel, p); // get string for second test
          else GetLabelStr(truelabel2, p);
          strcat(p, EOLSTR);                            // @C0002:
        }
        DebugMsg1(("%u GetAndExpression: jmp inverted >%s<\n", evallvl, hllop->lastjmp));
        if (truelabel2 > 0)
          ReplaceLabel(buffer, GetLabel(hll, ilabel), truelabel2);
        else
          ReplaceLabel(buffer, GetLabel(hll, ilabel), truelabel);
        hllop->lastjmp = NULL;
      }
    }
    if (tokenarray[*i-2].token == T_CL_BRACKET){
      if (truelabel2 > 0) {
        ptr += strlen(ptr);
        GetLabelStr(truelabel2, ptr);
        strcat(ptr, LABELQUAL EOLSTR);
        DebugMsg1(("%u GetAndExpression: label added >%s<\n", evallvl, ptr));
        hllop->lastjmp = NULL;
        truelabel2 = 0;
        }
      }
    ptr += strlen(ptr);
    hllop->lasttruelabel = 0; /* v2.08 */
    if (ERROR == GetSimpleExpression(hll, i, tokenarray, ilabel, is_true, ptr, hllop))
      return(ERROR);
  };
   //here is label created
  if (truelabel > 0) {
    ptr += strlen(ptr);
    GetLabelStr(truelabel, ptr);
    strcat(ptr, LABELQUAL EOLSTR);
    DebugMsg1(("%u GetAndExpression: label added >%s<\n", evallvl, ptr));
    hllop->lastjmp = NULL;
  }
  else {
      if (truelabel2 > 0) {
        ptr += strlen(ptr);
        GetLabelStr(truelabel2, ptr);
        strcat(ptr, LABELQUAL EOLSTR);
        DebugMsg1(("%u GetAndExpression: label added >%s<\n", evallvl, ptr));
        hllop->lastjmp = NULL;
        }
    }
  return(NOT_ERROR);
}

Cod-Father

habran

#6
Bug#2:
It is not a bug, I have done it purposely so that we can use same register several times in the function call, like:
xor edx,edx
invoke SomeFunction, rcx, rdx, edx, edx, rdx
instead of:
invoke SomeFunction, rcx, NULL, 0, 0, NULL

If that bothers someone I can create an OPTION, something like: WORNREGOVERWRITTEN
It can be default WORNREGOVERWRITTEN 1
and not         WORNREGOVERWRITTEN 0
Cod-Father

habran

Bug #3, I have to check yet and Bug #4 is half true, if you use original HJWasm from Teraspace, it will show source code and line nmbers because it is built with msvc2005ddk. Visual Studio is producing different debug info which HJWasm doesn't support.
So, to have fast and small exe with a proper debug info you need to build it with the msvc2005ddk
Cod-Father

habran

I'll investigate why is Bug #3 happening, it shouldn't be something complicated
I have added 1 additional local before and then it works fine:

Quote
func proc
LOCAL var :DWORD
LOCAL Buff[128]:byte

   xor eax,eax
   ret
func endp


0113103C 55                           push       ebp 
0113103D 8B EC                      mov        ebp,esp 
0113103F 81 EC 84 00 00 00    sub         esp,84h 
01131045 33 C0                       xor         eax,eax 
01131047 8B E5                       mov        esp,ebp 
01131049 5D                           pop         ebp 
0113104A C3                           ret
Cod-Father

habran

So, all of issues above will be fixed in 2.18 release.
To prove that HJWasm, when built with msvc2005ddk  produces correct debug info here is an example:
Quote
    11: start:
    12:     call func
002b1010 E8 3C 00 00 00                   call         0x2b1051 
    13:     xor edx,edx
002b1015 33 D2                            xor         edx, edx 
    14:     .while   (SDWORD ptr ecx >=0) && (edx ==0 || edx ==1)
002b1017 EB 02                            jmp         0x2b101b 
    15:         inc edx
002b1019 42                               inc         edx 
    16:         dec ecx
002b101a 49                               dec         ecx 
    17:     .endw
002b101b 83 F9 00                         cmp         ecx, 0x0 
002b101e 7C 09                            jl         0x2b1029 
002b1020 85 D2                            test         edx, edx 
002b1022 74 F5                            jz         0x2b1019 
002b1024 83 FA 01                         cmp         edx, 0x1 
002b1027 74 F0                            jz         0x2b1019 
    18:     .if edx == 3
002b1029 83 FA 03                         cmp         edx, 0x3 
002b102c 75 0F                            jnz         0x2b103d 
    19:         invoke MessageBoxA,0,0,0,3       
002b102e 6A 03                            push         0x3 
002b1030 6A 00                            push         0x0 
002b1032 6A 00                            push         0x0 
002b1034 6A 00                            push         0x0 
002b1036 E8 39 10 00 00                   call         0x2b2074 
    20:     .else
002b103b EB 0D                            jmp         0x2b104a 
    21:         invoke MessageBoxA,0,0,0,1
002b103d 6A 01                            push         0x1 
002b103f 6A 00                            push         0x0 
002b1041 6A 00                            push         0x0 
002b1043 6A 00                            push         0x0 
002b1045 E8 2A 10 00 00                   call         0x2b2074 
    22:     .endif
    23:     invoke ExitProcess,0
002b104a 6A 00                            push         0x0 
002b104c E8 1D 10 00 00                   call         0x2b206e 
    24:
Cod-Father

jj2007

Quote from: jj2007 on December 29, 2016, 05:13:52 AMSDWORD ptr ecx >=0 could be translated to the shorter test + jns

What do you think about this optimisation?

p2_s: xor edx,edx
      mov ecx,1
      jmp @test

@more: inc edx
      dec ecx
@test: test edx, edx
      je @ecx
      cmp edx, 1
      jne @out
@ecx: test ecx, ecx
      jns @more

p2_endp: nop

habran

@ecx: test ecx, ecx
      jns @more
It is certainly better than cmp ecx, 0, however, code can be more optimized and faster if we do that:


Quote
p2_s:  xor edx,edx
       mov ecx,1
       jmp @test

@more: inc edx
       dec ecx
@test: test ecx, ecx
       js @out
       test edx, edx
       je @more
       cmp edx, 1
       je @more
@out:  nop

Cod-Father

jj2007

Quote from: habran on December 30, 2016, 06:20:41 AMIt is certainly better than cmp ecx, 0, however, code can be more optimized and faster if we do that:

You mentioned that above:
Quote from: habran on December 29, 2016, 02:36:57 PM
To make code more optimised in this case:
.while (edx ==0 || edx ==1) && (SDWORD ptr ecx >=0)

it should be better to write:
.while   (SDWORD ptr ecx >=0) && (edx ==0 || edx ==1)

But the question here was rather if SDWORD ptr ecx >=0 and edx==0 could generally be translated to test+jns and test+jne instead of the longer comparison with zero. Purists will say "that is not what I wrote", of course - but I can't see a situation where this optimisation would cause misbehaving code. And purists would never use .While ... .Endw anyway ;)

Btw ML 6.15 does this for edx==0 above - see or edx, edx in the disassembly.

hutch--

The Intel manuals have since the PIV days said that INC and DEC are slower than ADD / SUB. If an algorithm is not optimised properly its not like it matters but if the rest of the algo is optimised properly then using ADD / SUB is faster.

jj2007

Quote from: hutch-- on December 30, 2016, 08:13:32 AMThe Intel manuals have since the PIV days said that INC and DEC are slower than ADD / SUB.

Ah, I love it :tIntel(R) Core(TM) i5-2450M CPU @ 2.50GHz (SSE4)

10      cycles for 1000 * dec
10      cycles for 1000 * sub

9       cycles for 1000 * dec
9       cycles for 1000 * sub

10      cycles for 1000 * dec
10      cycles for 1000 * sub

8       cycles for 1000 * dec
10      cycles for 1000 * sub

9       cycles for 1000 * dec
11      cycles for 1000 * sub