The MASM Forum

64 bit assembler => UASM Assembler Development => Topic started by: habran on May 17, 2016, 06:30:15 AM

Title: New HJWasm release
Post by: habran on May 17, 2016, 06:30:15 AM
Hello everyone,
there is a new HJWasm release on the Terraspace (http://www.terraspace.co.uk/hjwasm.html) with some sophisticated improvements 8)
Built in two new options for the .SWITCH block:
option SWITCHSTYLE : ASMSTYLE   (default)
option SWITCHSTYLE:  CSTYLE       (optional)
hutch, I hope you will be happy with this one, now you have ASMSTYLE .SWITCH block, thank you for your suggestion :t
Multiple cases can be used with the comma ',' and can be continued in new line if they don't fit in one
If you don't need .DEFAULT it can be omitted in both  ASMSTYLE or CSTYLE
This version also reduces memory consumption in multiple cases, it will create only one jump table for all cases in both styles.
You can switch to another style and go back to first one as many time as you want using option SWITCHSTYLE
here are some examples:
Quotemov eax, 184h
   .switch eax
   .case 179h,180h,1c5h,17bh,17dh,
   182h,184h,185h
     mov edx,1d5h
   .case 1d3h
     mov edx, 1d3h
   .case 1f4h
     mov edx, 1f4h
   .case 200h
      mov  edx, 200h
   .case 201h
     mov  edx, 201h
   .case 202h
     mov  edx, 202h
   .case 203h
     mov  edx, 203h
   .default                 
     mov edx, 0
   .endswitch

   option SWITCHSTYLE: CSTYLE

   mov eax, 184h
   .switch eax
   .case 179h, 180h, 1c5h, 17bh, 17dh,
   182h, 184h, 185h
   mov edx, 1d5h
   .break
   .case 1d3h
   mov edx, 1d3h
   .break
   .case 1f4h
   mov edx, 1f4h
   .break
   .case 200h
   mov  edx, 200h
   .break
   .case 201h
   mov  edx, 201h
   .break
   .case 202h
   mov  edx, 202h
   .break
   .case 203h
   mov  edx, 203h
   .break
   .default
   mov edx, 0
   .break
   .endswitch

option SWITCHSTYLE : ASMSTYLE

.switch bl
   .case 202
     mov  edx, 202
   .case 203
     mov  edx, 203
    .case 2013
     mov  edx, 213
   .endswitch
Title: Re: New HJWasm release
Post by: jj2007 on May 17, 2016, 09:32:47 AM
Well done :t

With my 800 lines testbed, it's much faster than ML and considerably faster than Japheth's last JWasm version:
  OxPT_Assembler  mlv615 ; 44.0kB, 1070 ms
  OxPT_Assembler  mlv10 ; 44.0kB, 1070 ms
  OxPT_Assembler  JWasm ; 44.0kB, 650 ms
  OxPT_Assembler  HJWasm32 ; 44.0kB, 580 ms
  OxPT_Assembler  HJWasm64 ; 44.0kB, 580 ms
  OxPT_Assembler  asmc ; 44.0kB, 480 ms


Note that there is no measurable speed difference between the 32-bit and 64-bit versions. The latter is 63% fatter, though ;)

Same pattern with the RichMasm source (17k lines):
OxPT_Assembler mlv10 ; 1200 ms
OxPT_Assembler mlv615 ; 1200 ms
OxPT_Assembler JWasm ; 880 ms
OxPT_Assembler HJWasm32 ; 820 ms
OPT_Assembler HJWasm64 ; 820 ms
OxPT_Assembler AsmC ; 740 ms


Now the surprise when building the MasmBasic library (28k lines):
OPT_Assembler mlv615 ; 6.9 secs
OxPT_Assembler JWasm ; 5.5 secs
OxPT_Assembler HJWasm32 ; 2.8
OxPT_Assembler HJWasm64 ; 4.2 secs
OxPT_Assembler AsmC ; 2.1 secs


The 32-bit version is consistently 50% faster 8)
Title: Re: New HJWasm release
Post by: habran on May 17, 2016, 01:58:39 PM
Thanks jj2007 :biggrin:
HJWasm32 has to deal with less code so it can be the reason,
but I suspect that your machine is more experienced in running 32 bit and hence the speed ;)
I have here example how much has the .SWITCH block being improved:
Quote

This code:
  mov eax, 1c5h
  .switch eax
   .case 179h,17bh,17dh,182h,184h,187h,18bh,191h,198h,
         1a0h,1a2h,1a4h,1a7h,1ach,1afh,1b3h,1b5h,1b8h,
         1bch,1c5h,1c8h,1cbh,1cdh,1cfh,1d1h,1d3h,1d5h,
         1d7h,1d9h,1dbh,1f2h,1f4h,200h
     mov  edx, 200h
   .case 201h
   mov  edx, 201h
   .case 202h
   mov  edx, 202h
   .case 203h
   mov  edx, 203h
  .default
   mov edx, 0
  .endswitch

Now it makes this:
?_021   LABEL NEAR
        sub     rsp, 472                                ; 40001165 _ 48: 81. EC, 000001D8
        mov     eax, 453                                ; 4000116C _ B8, 000001C5
        jmp     ?_023                                   ; 40001171 _ EB, 32

; Note: No jump seems to point here
        mov     edx, 512                                ; 40001173 _ BA, 00000200
        jmp     ?_026                                   ; 40001178 _ E9, 000000FF

; Note: No jump seems to point here
        mov     edx, 513                                ; 4000117D _ BA, 00000201
        jmp     ?_026                                   ; 40001182 _ E9, 000000F5

; Note: No jump seems to point here
        mov     edx, 514                                ; 40001187 _ BA, 00000202
        jmp     ?_026                                   ; 4000118C _ E9, 000000EB

; Note: No jump seems to point here
        mov     edx, 515                                ; 40001191 _ BA, 00000203
        jmp     ?_026                                   ; 40001196 _ E9, 000000E1

?_022:  mov     edx, 0                                  ; 4000119B _ BA, 00000000
        jmp     ?_026                                   ; 400011A0 _ E9, 000000D7

?_023:  cmp     eax, 515                                ; 400011A5 _ 3D, 00000203
        ja      ?_022                                   ; 400011AA _ 77, EF
        sub     eax, 377                                ; 400011AC _ 2D, 00000179
        jc      ?_022                                   ; 400011B1 _ 72, E8
        lea     rdx, ptr [?_025]                        ; 400011B3 _ 48: 8D. 15, 00000037(rel)
        movzx   rax, byte ptr [rax+rdx]                 ; 400011BA _ 48: 0F B6. 04 10
        lea     rdx, ptr [?_024]                        ; 400011BF _ 48: 8D. 15, 00000003(rel)
        jmp     qword ptr [rdx+rax*8]                   ; 400011C6 _ FF. 24 C2

?_024   label qword                                     ; switch/case jump table
        dq Unnamed_80000000_0                           ; 400011C9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011D1 _ 000000014000117D (d)
        dq Unnamed_80000000_0                           ; 400011D9 _ 0000000140001187 (d)
        dq Unnamed_80000000_0                           ; 400011E1 _ 0000000140001191 (d)
        dq Unnamed_80000000_0                           ; 400011E9 _ 000000014000119B (d)
[?_025] label byte
    db 00 04 00 04 00 04 04 04 04 00 04 00 04 04 00 04 04 04 00 04 04 04
    db 04 04 00 04 04 04 04 04 04 00 04 04 04 04 04 04 04 00 04 00 04 00
    db 04 04 00 04 04 04 04 00 04 04 00 04 04 04 00 04 00 04 04 00 04 04
    db 04 00 04 04 04 04 04 04 04 04 00 04 04 00 04 04 00 04 00 04 00 04
    db 00 04 00 04 00 04 00 04 00 04 00 04 04 04 04 04 04 04 04 04 04 04
    db 04 04 04 04 04 04 04 04 04 04 04 00 04 00 04 04 04 04 04 04 04 04
    db 04 04 04 00 01 02 03 b8 44 00 00 00 eb 66 e8 10 fe ff ff e9 aa 00


Before:


?_021   LABEL NEAR
        sub     rsp, 472                                ; 40001165 _ 48: 81. EC, 000001D8
        mov     eax, 453                                ; 4000116C _ B8, 000001C5
        jmp     ?_023                                   ; 40001171 _ EB, 32

; Note: No jump seems to point here
        mov     edx, 512                                ; 40001173 _ BA, 00000200
        jmp     ?_026                                   ; 40001178 _ E9, 000001FF

; Note: No jump seems to point here
        mov     edx, 513                                ; 4000117D _ BA, 00000201
        jmp     ?_026                                   ; 40001182 _ E9, 000001F5

; Note: No jump seems to point here
        mov     edx, 514                                ; 40001187 _ BA, 00000202
        jmp     ?_026                                   ; 4000118C _ E9, 000001EB

; Note: No jump seems to point here
        mov     edx, 515                                ; 40001191 _ BA, 00000203
        jmp     ?_026                                   ; 40001196 _ E9, 000001E1

?_022:  mov     edx, 0                                  ; 4000119B _ BA, 00000000
        jmp     ?_026                                   ; 400011A0 _ E9, 000001D7

?_023:  cmp     eax, 515                                ; 400011A5 _ 3D, 00000203
        ja      ?_022                                   ; 400011AA _ 77, EF
        sub     eax, 377                                ; 400011AC _ 2D, 00000179
        jc      ?_022                                   ; 400011B1 _ 72, E8
        lea     rdx, ptr [?_025]                        ; 400011B3 _ 48: 8D. 15, 00000137(rel)
        movzx   rax, byte ptr [rax+rdx]                 ; 400011BA _ 48: 0F B6. 04 10
        lea     rdx, ptr [?_024]                        ; 400011BF _ 48: 8D. 15, 00000003(rel)
        jmp     qword ptr [rdx+rax*8]                   ; 400011C6 _ FF. 24 C2

?_024   label qword                                     ; switch/case jump table
        dq Unnamed_80000000_0                           ; 400011C9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011D1 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011D9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011E1 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011E9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011F1 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400011F9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001201 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001209 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001211 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001219 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001221 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001229 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001231 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001239 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001241 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001249 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001251 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001259 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001261 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001269 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001271 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001279 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001281 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001289 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001291 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 40001299 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012A1 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012A9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012B1 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012B9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012C1 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012C9 _ 0000000140001173 (d)
        dq Unnamed_80000000_0                           ; 400012D1 _ 000000014000117D (d)
        dq Unnamed_80000000_0                           ; 400012D9 _ 0000000140001187 (d)
        dq Unnamed_80000000_0                           ; 400012E1 _ 0000000140001191 (d)
        dq Unnamed_80000000_0                           ; 400012E9 _ 000000014000119B (d)

    db 00 24 01 24 02 24 24 24 24 03 24 04 24 24 05 24 24 24 06 24 24 24
    db 24 24 07 24 24 24 24 24 24 08 24 24 24 24 24 24 24 09 24 0a 24 0b
    db 24 24 0c 24 24 24 24 0d 24 24 0e 24 24 24 0f 24 10 24 24 11 24 24
    db 24 12 24 24 24 24 24 24 24 24 13 24 24 14 24 24 15 24 16 24 17 24
    db 18 24 19 24 1a 24 1b 24 1c 24 1d 24 24 24 24 24 24 24 24 24 24 24
    db 24 24 24 24 24 24 24 24 24 24 24 1e 24 1f 24 24 24 24 24 24 24 24
    db 24 24 24 20 21 22 23 b8 44
   
Title: Re: New HJWasm release
Post by: jj2007 on May 17, 2016, 04:46:59 PM
Quote from: habran on May 17, 2016, 01:58:39 PM
Thanks jj2007 :biggrin:
HJWasm32 has to deal with less code so it can be the reason,
but I suspect that your machine is more experienced in running 32 bit and hence the speed ;)
I have here example how much has the .SWITCH block being improved:

I would argue about the "more code" logic if my 27k lines of code consisted significantly of .switch structures, but for compatibility reasons I am still using good ol' Switch_ (http://masm32.com/board/index.php?topic=94.msg57249#msg57249) (remind me to set up a speed & size comparison between Switch_ and .switch ...)

But the "more experienced in running 32 bit" argument is certainly logical :lol:
Title: Re: New HJWasm release
Post by: TWell on May 17, 2016, 05:49:44 PM
@jj2007
here is PellesC 8 x64 version for speed test.
Title: Re: New HJWasm release
Post by: jj2007 on May 17, 2016, 09:18:53 PM
@TWell: library build exactly the same as HJWasm32, RichMasm a bit faster:
OxPT_Assembler JWasm ; 880 ms
OxPT_Assembler HJWasm32 ; 820 ms
OxPT_Assembler HJWasm64 ; 820 ms
OxPT_Assembler HJwasm64poc ; 770 ms
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 06:15:47 AM
jj2007,
Test this one please
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 08:00:58 AM
Quote from: habran on May 18, 2016, 06:15:47 AM
jj2007,
Test this one please

As fast as the 32-bit version with the RichMasm source,
but with the library, the 32-bit version is exactly 50% faster.
Title: Re: New HJWasm release
Post by: TWell on May 18, 2016, 08:16:14 AM
My test with m32lib *.asm files
HJWasm64.exe -c -coff -q \masm32\m32lib\*.asm

HJWasm32.exe 16.982s
HJWasmGcc.exe 16.712s
HJWasm64poc.exe 21.224s
HJWasm.exe 29.547s
HJWasm64.exe 30.14s
This is odd :icon_confused:
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 08:32:05 AM
OK, last one was built with  VS15 with full optimization
this one is built with GCC via C:B
let see which one is faster
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 09:13:03 AM
This one, MB library:
last one 2.77 secs (version 504866 bytes, 18 May)
32-bit 3.15 secs (version 361472 bytes, 16 May)

VS15 has fantastic optimisations :eusa_boohoo:

Btw the last one loads remarkably well with OllyDbg, a well-known 32-bit debugger 8)
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 09:25:01 AM
Does that mean that GCC is the best version?
Title: Re: New HJWasm release
Post by: TWell on May 18, 2016, 04:25:26 PM
32-bit gcc version is fast.
m32lib again with another PC.
HJWasmGcc.exe    7.368s 7.659s
HJWasm32.exe     9.153s 9.729s
HJWasm64poc.exe  10.626s 10.664s
HJWasm64Gcc.exe  16.941s 17.214s

64-bit not so fast, compiled with v 5 with option -O2
@SET SRC=..\HJWasm-master
@SET CC=gcc -c -O2 -m64 -I..\HJWasm-master\H -DWIN64=1
@REM hjwasm64gcc.exe:
for %%c in (%SRC%\*.c) do %CC% %%c

gcc -s main.o apiemu.o assemble.o assume.o atofloat.o backptch.o bin.o branch.o cmdline.o codegen.o coff.o condasm.o context.o cpumodel.o data.o dbgcv.o directiv.o elf.o end.o equate.o errmsg.o expans.o expreval.o extern.o fastpass.o fixup.o fpfixup.o hll.o input.o invoke.o label.o linnum.o listing.o loop.o lqueue.o macro.o mangle.o memalloc.o msgtext.o omf.o omffixup.o omfint.o option.o parser.o posndir.o preproc.o proc.o queue.o reswords.o safeseh.o segment.o simsegm.o string.o symbols.o tbyte.o tokenize.o types.o -o hjwasm64gcc.exe

@DEL *.o
Is in my test something wrong :icon_confused:
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 05:01:55 PM
I have built mine GCC with -O3 and you can see a reduction in size
are you saying that -O2 is producing faster code than -O3?

I can also see that Pelle's C is producing even less code than GCC
Are you sure that HJWasmPoc.64 is fastest 64 bit?

Let show only 64 bit speed with all versions
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 05:18:07 PM
Quote from: habran on May 18, 2016, 05:01:55 PMLet show only 64 bit speed with all versions

OK, but don't distort the competition by smuggling in (Reply #9) 32-bit versions called HJWasm64 :eusa_naughty:

GCC yes, but no doping please :t
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 05:30:15 PM
I was not aware of that :icon_eek:
Here it is:
mingw32-gcc.exe -O3 -DWIN64 -DNDEBUG -I.\H -I"C:\Program Files (x86)\mingw-w64\i686-4.9.2-posix-dwarf-rt_v3-rev0\mingw32\i686-w64-mingw32\bin" -I"C:\Program Files (x86)\CodeBlocks\MinGW\bin" -c "C:\Users\Brane
Title: Re: New HJWasm release
Post by: TWell on May 18, 2016, 05:38:08 PM
-m64 isn't in commandline, so resut was 32-bit.
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 05:38:56 PM
It seems you picked the right options this time - almost as fast as AsmC :t

OxPT_Assembler HJWasm32 ; 3.15
OxPT_Assembler HJWasm64 ; 2.80 secs


Btw where are your bottlenecks? Is loading string arrays and finding matches in these strings one of them?
- the FAST option is typically about twice as fast as CRT strstr, but 3..4 times as fast when used with
  MasmBasic string arrays (Intel Core i5 timings for counting a rare word in a file with 800 MB, 6 Mio lines):
232 ms for fast Instr_
795 ms for "normal" Instr_
999 ms for Masm32 InString
929 ms for CRT strstr
;)
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 05:40:34 PM
Quote from: TWell on May 18, 2016, 05:38:08 PM
-m64 isn't in commandline, so resut was 32-bit.

Oops, you are right - doping alarm :dazzled:

Btw you could check if you have an old GCC version. Google finds plenty of hits for gcc 64-bit slower than 32-bit, many of them around spring 2014. So maybe the developers have saved the honour of 64-bit compilers in the meantime 8)
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 07:34:55 PM
OK, let see this one, is it x64 ::)
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 08:08:48 PM
Quote from: TWell on May 18, 2016, 04:25:26 PMIs in my test something wrong :icon_confused:

OPT_Assembler hjwasm32gcc3 ; 2.7
OxPT_Assembler hjwasm64gcc ; 6.1
OPT_Assembler AsmC ; 2.1 secs

::)
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 08:32:01 PM
Quote from: habran on May 18, 2016, 07:34:55 PM
OK, let see this one, is it x64 ::)

2.55 secs, so far your best one :t

(one little problem: Olly says it's 32-bit code...)
Title: Re: New HJWasm release
Post by: nidud on May 18, 2016, 09:02:29 PM
deleted
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 09:48:54 PM
Quote from: nidud on May 18, 2016, 09:02:29 PMThe switch still trashes registers

That seems not the only problem. Here is a snippet - try with the latest AsmC and HJWasm versions 8)
include \masm32\include\masm32rt.inc

.code
start:
  m2m edi, -5
  .Repeat
print chr$(13, 10)
print str$(edi), 9
.switch edi
.case -4
print "case -4 "
; .break
.case -2
print "case -2 "
; .break
.case 0
print "case 0 "
.break
.case 2
print "case +2 "
.break
.case 4
print "case +4 "
.break
.Default
print "default"
.Endsw
inc edi
  .Until sdword ptr edi>5
  inkey chr$(13, 10, "--- ok? ---")
  exit
end start
Title: Re: New HJWasm release
Post by: nidud on May 18, 2016, 10:38:12 PM
deleted
Title: Re: New HJWasm release
Post by: jj2007 on May 18, 2016, 11:32:39 PM
With AsmC, no options but .break instead, the .break causes an exit of the .Repeat loop. Is that "by design"?
Title: Re: New HJWasm release
Post by: habran on May 18, 2016, 11:35:35 PM
Hi nidud,
This code below is a brilliant idea, and I like it a lot, however, it is not applicable in 64 bit because it would force linker LARGEADDRESS  error, while I was writing the code my mind was focused on 64 bit and 32 bit was just conversion from 64 bit, now I see that I have to start to think from 32 bit side if I write 32 bit code 

cmp eax,min
jl endsw
cmp eax,max
jg endsw
push eax
movzx eax,index[eax-min]
mov eax,table[eax*4]
xchg eax,[esp]
retn

In the case of jj2007 source, I don't think that any ASM programmer would write such a code that would trash registers that are needed for next iteration.
You can, of course, push register on the stack, and then purposely overwrite that memory on the stack, and than complain that the assembler is not good because it let you overwrite the stack space ::)
If that happened I would not fill sorry for that "programmer"   
Title: Re: New HJWasm release
Post by: nidud on May 19, 2016, 12:07:59 AM
deleted
Title: Re: New HJWasm release
Post by: jj2007 on May 19, 2016, 12:11:07 AM
Quote from: habran on May 18, 2016, 11:35:35 PMIn the case of jj2007 source, I don't think that any ASM programmer would write such a code that would trash registers that are needed for next iteration ... would not fill sorry for that "programmer"

So far, I have tried to be helpful. Show me one occasion where I insulted you as a "programmer" or similar.

Besides, this is obviously valid code, since edi is a non-volatile register. I perfectly understand why you are pissed off, but please concentrate on your homework instead of attacking others.
Title: Re: New HJWasm release
Post by: habran on May 19, 2016, 12:21:33 AM
My intention was not to attack you jj2007, I am grateful for your help and cooperation, and I already said before how much I appreciate you as a programmer, however, I am sure that you would never write this construction in your programs, and you have to admit that :biggrin:
Title: Re: New HJWasm release
Post by: habran on May 19, 2016, 12:25:15 AM
"programmer" was not pointed to you but to someone who would write some program to delete his stack
Title: Re: New HJWasm release
Post by: nidud on May 19, 2016, 01:06:14 AM
deleted
Title: Re: New HJWasm release
Post by: jj2007 on May 19, 2016, 01:59:36 AM
Quote from: habran on May 19, 2016, 12:21:33 AMI am sure that you would never write this construction in your programs, and you have to admit that :biggrin:

OK, let's declare it a misunderstanding. But now I am curious: Where in my code do I trash the stack that I need later on?
Title: Re: New HJWasm release
Post by: jj2007 on May 19, 2016, 06:59:53 AM
Quote from: nidud on May 19, 2016, 12:07:59 AMAh, finally: Did you RTFM  :lol:

No, I was busy reading the rest of the Internet 8)

Anyway, latest results from my switch testbed:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
24 ms   case 260, MB Switch_ table
230 ms  case 260, MB Switch_ chain
455 ms  case 260, Masm32 switch
31 ms   case 260, HJWasm .Switch
6 ms    case 260, AsmC .Switch

24 ms   case 196, MB Switch_ table
178 ms  case 196, MB Switch_ chain
341 ms  case 196, Masm32 switch
38 ms   case 196, HJWasm .Switch
6 ms    case 196, AsmC .Switch

23 ms   case 132, MB Switch_ table
127 ms  case 132, MB Switch_ chain
229 ms  case 132, Masm32 switch
22 ms   case 132, HJWasm .Switch
6 ms    case 132, AsmC .Switch

23 ms   case 68, MB Switch_ table
76 ms   case 68, MB Switch_ chain
120 ms  case 68, Masm32 switch
40 ms   case 68, HJWasm .Switch
7 ms    case 68, AsmC .Switch

23 ms   case 4, MB Switch_ table
24 ms   case 4, MB Switch_ chain
6 ms    case 4, Masm32 switch
38 ms   case 4, HJWasm .Switch
6 ms    case 4, AsmC .Switch

2989    bytes for MbTable
4840    bytes for MbChain
4799    bytes for Masm32
6978    bytes for hjwasm
4208    bytes for asmc


The last AsmC row was added "by hand" because obviously you can't assemble the source with both assemblers at the same time. If you want to build it yourself, open the source in RichMasm and press Ctrl End. The OPT_Assembler rows should speak for themselves. OxPT is a disabled one (RichMasm looks for a case-sensitive OPT_). If two options are active, the last one is valid (I know, I know, modern IDEs have somewhere a project options menu where you can set the assembler if you find the right menu item; RM is very old fashioned, sorry).
Title: Re: New HJWasm release
Post by: TWell on May 19, 2016, 07:20:44 PM
DELETED
Title: Re: New HJWasm release
Post by: jj2007 on May 19, 2016, 08:35:00 PM
64-bit version is faster:
OxPT_Assembler hjwasm32msvcrt ; 7.6
OxPT_Assembler hjwasm64msvcrt ; 6.2
OxPT_Assembler AsmC ; 2.5 secs
Title: Re: New HJWasm release
Post by: TWell on May 19, 2016, 09:43:14 PM
DELETED
Title: Re: New HJWasm release
Post by: jj2007 on May 19, 2016, 10:03:10 PM
2.66 secs :t

Here are all my current timings:
HJWasmTWell ; 7.8 secs (9 May)
mlv10 ; 7.8 secs
mlv615 ; 7.0 secs - use for release version
JWasm ; 5.5 secs
HJWasm32 ; 3.15
HJWasm64 ; 2.80 secs
HJwasm64poc ; 2.75
hjwasm32gcc3 ; 2.7
hjwasm64gcc ; 6.1
HJWasm64Habran ; 2.55 secs, but it's 32-bit code
hjwasm32msvcrt ; 7.6
hjwasm64msvcrt ; 6.2
hjwasm32msv13 ; 2.65
AsmC ; 2.5 secs (used to be 2.1...)


In practice, I use AsmC for testing, not only because it's fastest but also because it gives direct feedback, i.e. you can see
Assembling: C:\Masm32\MasmBasic\libtmpAA.asm
Assembling: C:\Masm32\MasmBasic\libtmpAB.asm
Assembling: C:\Masm32\MasmBasic\libtmpAC.asm
Assembling: C:\Masm32\MasmBasic\libtmpAD.asm

while it is assembling. JWasm and ML 6.15 do the same, most others let you wait until everything is complete, which is less nice to watch. But that is a very personal preference, of course 8)

Btw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:
Title: Re: New HJWasm release
Post by: jj2007 on May 20, 2016, 09:00:52 AM
Quote from: jj2007 on May 19, 2016, 10:03:10 PMBtw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:

Thanks, Tim :t

2669 ms, 9459844 time(s): address 004386A0 _SymFind                   004386a0 f   symbols.obj
2647 ms, 4850279 time(s): address 00420400 _my_fgets                  00420400 f   input.obj
1383 ms, 10972418 time(s): address 0043A300 _get_id                    0043a300 f   tokenize.obj
1242 ms, 4800861 time(s): address 0043A6A0 _Tokenize                  0043a6a0 f   tokenize.obj
1136 ms, 10357549 time(s): address 00434DF0 _FindResWord               00434df0 f   reswords.obj
1041 ms, 9461863 time(s): address 004384D0 _hashpjw                   004384d0 f   symbols.obj
  921 ms, 16118880 time(s): address 0043A540 _GetToken                  0043a540 f   tokenize.obj
Title: Re: New HJWasm release
Post by: TWell on May 20, 2016, 09:54:19 PM
DELETED
Title: Re: New HJWasm release
Post by: jj2007 on May 20, 2016, 11:32:35 PM
3.0 secs, so far the best 64-bit version (AsmC: 2.5 secs).
But 32-bit version is 10% faster:

OxPT_Assembler hjwasm642005DDK ; 3.0
OPT_Assembler hjwasm322005DDK ; 2.7
OxPT_Assembler AsmC ; 2.5 secs


What about _SymFind and _my_fgets? Long and complicated, or is there a chance to give them a boost?
Title: Re: New HJWasm release
Post by: johnsa on May 23, 2016, 11:04:21 PM
Hey,

Do we have any clear indication as to why a C project compiled with an 11 year old version of MSVC is so much faster than if compiled with VS2015 ??
Is it purely down to the CRT inclusions being more bloated/less performant?
Title: Re: New HJWasm release
Post by: TWell on May 23, 2016, 11:10:51 PM
MS MT CRT fault.
Here are cl v19 compiled version with 2003 DDK libc.lib
5.325s        asmc
6.521s        hjwa64-2015clib.exe
7.180s        hjwa32-2015clib.exe
Title: Re: New HJWasm release
Post by: jj2007 on May 24, 2016, 12:18:45 AM
Quote from: johnsa on May 23, 2016, 11:04:21 PMwhy a C project compiled with an 11 year old version of MSVC is so much faster than if compiled with VS2015 ??

Compilers develop. We are all running extremely old CPUs, new compilers optimise for the latest CPUs 8)
Title: Re: New HJWasm release
Post by: habran on May 24, 2016, 12:25:47 AM
Hi TWell,
There is a new HJWasm on Terraspace built with your tools,thank you :t
as well as improved source on Github
This one you built above doesn't debug on source level in 32 bit
Title: Re: New HJWasm release
Post by: habran on May 24, 2016, 12:33:47 AM
Sorry JJ, I was busy these days to fix HJWasm :biggrin:
Try it now please, it is updated on Terraspace 8)
Title: Re: New HJWasm release
Post by: TWell on May 24, 2016, 01:40:06 AM
Quote from: habran on May 24, 2016, 12:25:47 AM
This one you built above doesn't debug on source level in 32 bit
That AVX was missing too :redface:
I was testing m32lib compile only as there is a lot file access.

PS: have anyone old Vista DDK ? Is there libc.lib (VS 2005?)
Title: Re: New HJWasm release
Post by: jj2007 on May 24, 2016, 02:09:47 AM
Quote from: habran on May 24, 2016, 12:33:47 AM
Sorry JJ, I was busy these days to fix HJWasm :biggrin:
Try it now please, it is updated on Terraspace 8)

OxPT_Assembler HJWasm32 ; 2.7
OxPT_Assembler HJWasm64 ; 3.0 secs, and yes, it's 64-bit code
OxPT_Assembler AsmC ; 2.5 secs

:t
Title: Re: New HJWasm release
Post by: habran on May 24, 2016, 06:04:42 AM
Thanks JJ :t
So, taking in consideration that ASMC has most important parts translated to asm, and has less code to run, that is amazing speed.
With  HJWasm64 you are testing 32 bit code optimized for 32 bit, that is why it is slower than HJWasm32,
we should try opposite. I am planing to write some code optimized for x64 and than we will see how it will perform 8)
Title: Re: New HJWasm release
Post by: TWell on May 24, 2016, 04:04:29 PM
From VC6 samples:
# When building single-threaded applications you can link your executable
# with either LIBC, LIBCMT, or CRTDLL, although LIBC will provide the best
# performance.

Title: Re: New HJWasm release
Post by: habran on May 24, 2016, 04:11:05 PM
AFAIK  LIBCMT, is for static build and LIBC for dynamic build>
Where is that LIBC.LIB which you gave me the link for?
Isn't it from msvc2005?

Title: Re: New HJWasm release
Post by: TWell on May 24, 2016, 04:37:18 PM
libc.lib is a static library.
libc.lib can found from Windows NT 5.2 DDK (Server 2003 SP1) and x64 from 5.2.3790.2075.51.PlatformSDK_Svr2003R2_rtm too.

I can't inspect Vista DDK :(
Title: Re: New HJWasm release
Post by: habran on May 25, 2016, 07:54:54 PM
Hi TWell,
The one you build the last is working fine but I can't test for the speed
I have fixed some minor errors in hll.c and added a new feature to the .SWITCH block, so we will upgrade tonight or tomorrow.
If you give me your email address I will send you new hll.c for testing
Title: Re: New HJWasm release
Post by: TWell on May 27, 2016, 05:45:57 PM
With VisualCppBuildTools2015
5.418s        asmc
8.596s        HJWasm64.exe
12.372s       hjwa64-2015.exe

Not bad eh??
Title: Re: New HJWasm release
Post by: habran on May 27, 2016, 07:32:05 PM
VS2015 sucks :(
I prefer this one:
OxPT_Assembler HJWasm32 ; 2.7
OxPT_Assembler HJWasm64 ; 3.0 secs, and yes, it's 64-bit code
OxPT_Assembler AsmC ; 2.5 secs

:t
Title: Re: New HJWasm release
Post by: jj2007 on May 29, 2016, 06:35:58 AM
Quote from: jj2007 on May 19, 2016, 10:03:10 PMIn practice, I use AsmC for testing, not only because it's fastest but also because it gives direct feedback, i.e. you can see
Assembling: C:\Masm32\MasmBasic\libtmpAA.asm
Assembling: C:\Masm32\MasmBasic\libtmpAB.asm
Assembling: C:\Masm32\MasmBasic\libtmpAC.asm
Assembling: C:\Masm32\MasmBasic\libtmpAD.asm

while it is assembling. JWasm and ML 6.15 do the same, most others let you wait until everything is complete, which is less nice to watch. But that is a very personal preference, of course 8)

Btw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:

Re "others let you wait until everything is complete": would fflush(..) after each module help?
Title: Re: New HJWasm release
Post by: habran on June 01, 2016, 07:36:14 PM
New HJWasm uploaded on Terraspace (http://www.terraspace.co.uk/hjwasm.html) with some bug fixes and hopefully some speed improvement for the .SWITCH block hll
Title: Re: New HJWasm release
Post by: jj2007 on June 01, 2016, 11:33:21 PM
Timings for building the MB library are unchanged.

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
20 ms   case 260, MB Switch_ table
197 ms  case 260, MB Switch_ chain
370 ms  case 260, Masm32 switch
46 ms   case 260, HJWasm .Switch

19 ms   case 196, MB Switch_ table
146 ms  case 196, MB Switch_ chain
277 ms  case 196, Masm32 switch
46 ms   case 196, HJWasm .Switch

19 ms   case 132, MB Switch_ table
104 ms  case 132, MB Switch_ chain
186 ms  case 132, Masm32 switch
46 ms   case 132, HJWasm .Switch

19 ms   case 68, MB Switch_ table
62 ms   case 68, MB Switch_ chain
98 ms   case 68, Masm32 switch
46 ms   case 68, HJWasm .Switch

19 ms   case 4, MB Switch_ table
20 ms   case 4, MB Switch_ chain
5 ms    case 4, Masm32 switch
46 ms   case 4, HJWasm .Switch

2989    bytes for MbTable
4840    bytes for MbChain
4799    bytes for Masm32
5729    bytes for hjwasm
Title: Re: New HJWasm release
Post by: habran on June 02, 2016, 12:00:39 AM
Thanks JJ,
It looks like we have at least stable speed, considering it is written in C language it is pretty good.
It is probably possible to make it little bit faster with some more optimization.
Title: Re: New HJWasm release
Post by: habran on June 03, 2016, 09:33:15 PM
Hi JJ,
Can you please test this build with the same sources you did with the last one to see if there is the difference in speed?
Title: Re: New HJWasm release
Post by: johnsa on June 03, 2016, 10:34:51 PM
Aren't the timings JJ provided run-time execution of the switch rather than compile-time related?
Title: Re: New HJWasm release
Post by: habran on June 03, 2016, 11:19:44 PM
Yes, that is what I want to be tested.
I think that maybe I have faster sorting routine in this one.
Title: Re: New HJWasm release
Post by: jj2007 on June 03, 2016, 11:26:36 PM
Quote from: habran on June 03, 2016, 09:33:15 PMCan you please test this build with the same sources you did with the last one to see if there is the difference in speed?

Hi Habran,
Build speed for MasmBasic library is very good, only 10% slower now than AsmC :t
Would be nice to flush the console after each assembly, though.

Quote from: johnsa on June 03, 2016, 10:34:51 PM
Aren't the timings JJ provided run-time execution of the switch rather than compile-time related?

Here is run-time execution of the switch:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with AsmC
20 ms   case 260, MB Switch_ table
193 ms  case 260, MB Switch_ chain
374 ms  case 260, Masm32 switch
5 ms    case 260, AsmC .Switch

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
20 ms   case 260, MB Switch_ table
190 ms  case 260, MB Switch_ chain
374 ms  case 260, Masm32 switch
48 ms   case 260, HJWasm .Switch


And the size of the generated switch codes:
2989    bytes for MbTable
4840    bytes for MbChain
4799    bytes for Masm32
4201    bytes for asmc
5729    bytes for hjwasm


The AsmC .Switch is clearly fastest, while the MasmBasic Switch_ macro generates the smallest code for high case numbers (where it auto-selects the compact table version).
Title: Re: New HJWasm release
Post by: habran on June 03, 2016, 11:43:07 PM
Thanks JJ, it looks like former one is 2 ms faster :(
Title: Re: New HJWasm release
Post by: Raistlin on June 08, 2016, 08:04:18 PM
So I downloaded HJWasm - and lo and behold - when I tried to email it to myself
Google (Gmail scanner) says it contains malware.....seeesh

So I know Habran would never include such - but what file could be the cause of the false positive ?
Title: Re: New HJWasm release
Post by: johnsa on June 08, 2016, 08:33:44 PM
There is definitely no malware in the archives.. I would assume its heuristic scanning possibly picks up that the exe generates code which it doesn't like (possibly based on an unfamiliar name or origin) or alternatively the fact that the archive contains asm files..
Is there anyway to get a more detailed report from the scan as to exactly what it's not happy with?
Title: Re: New HJWasm release
Post by: jj2007 on June 08, 2016, 09:07:03 PM
If you are in doubt, upload the file to Jotti: nothing for HJWasm32 (https://virusscan.jotti.org/en-US/filescanjob/auaptrk7ij). As johnsa wrote, the AV scanners use heuristics, and those are a PITA (Pain In The A**). As soon as they see something that was apparently not built with MSVC or GCC, they make racist remarks about assembler code being dangerous etc 8)

Btw we have a dedicated sub-forum for that: AV Software sh*t list (http://masm32.com/board/index.php?board=23.0)
Title: Re: New HJWasm release
Post by: habran on June 09, 2016, 07:35:20 PM
New HJWasm uploaded on Terraspace with some more bug fixes in the .SWITCH block.