News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

New HJWasm release

Started by habran, May 17, 2016, 06:30:15 AM

Previous topic - Next topic

habran

"programmer" was not pointed to you but to someone who would write some program to delete his stack
Cod-Father

nidud

#31
deleted

jj2007

Quote from: habran on May 19, 2016, 12:21:33 AMI am sure that you would never write this construction in your programs, and you have to admit that :biggrin:

OK, let's declare it a misunderstanding. But now I am curious: Where in my code do I trash the stack that I need later on?

jj2007

Quote from: nidud on May 19, 2016, 12:07:59 AMAh, finally: Did you RTFM  :lol:

No, I was busy reading the rest of the Internet 8)

Anyway, latest results from my switch testbed:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
24 ms   case 260, MB Switch_ table
230 ms  case 260, MB Switch_ chain
455 ms  case 260, Masm32 switch
31 ms   case 260, HJWasm .Switch
6 ms    case 260, AsmC .Switch

24 ms   case 196, MB Switch_ table
178 ms  case 196, MB Switch_ chain
341 ms  case 196, Masm32 switch
38 ms   case 196, HJWasm .Switch
6 ms    case 196, AsmC .Switch

23 ms   case 132, MB Switch_ table
127 ms  case 132, MB Switch_ chain
229 ms  case 132, Masm32 switch
22 ms   case 132, HJWasm .Switch
6 ms    case 132, AsmC .Switch

23 ms   case 68, MB Switch_ table
76 ms   case 68, MB Switch_ chain
120 ms  case 68, Masm32 switch
40 ms   case 68, HJWasm .Switch
7 ms    case 68, AsmC .Switch

23 ms   case 4, MB Switch_ table
24 ms   case 4, MB Switch_ chain
6 ms    case 4, Masm32 switch
38 ms   case 4, HJWasm .Switch
6 ms    case 4, AsmC .Switch

2989    bytes for MbTable
4840    bytes for MbChain
4799    bytes for Masm32
6978    bytes for hjwasm
4208    bytes for asmc


The last AsmC row was added "by hand" because obviously you can't assemble the source with both assemblers at the same time. If you want to build it yourself, open the source in RichMasm and press Ctrl End. The OPT_Assembler rows should speak for themselves. OxPT is a disabled one (RichMasm looks for a case-sensitive OPT_). If two options are active, the last one is valid (I know, I know, modern IDEs have somewhere a project options menu where you can set the assembler if you find the right menu item; RM is very old fashioned, sorry).

TWell

#34
DELETED

jj2007

64-bit version is faster:
OxPT_Assembler hjwasm32msvcrt ; 7.6
OxPT_Assembler hjwasm64msvcrt ; 6.2
OxPT_Assembler AsmC ; 2.5 secs

TWell

#36
DELETED

jj2007

2.66 secs :t

Here are all my current timings:
HJWasmTWell ; 7.8 secs (9 May)
mlv10 ; 7.8 secs
mlv615 ; 7.0 secs - use for release version
JWasm ; 5.5 secs
HJWasm32 ; 3.15
HJWasm64 ; 2.80 secs
HJwasm64poc ; 2.75
hjwasm32gcc3 ; 2.7
hjwasm64gcc ; 6.1
HJWasm64Habran ; 2.55 secs, but it's 32-bit code
hjwasm32msvcrt ; 7.6
hjwasm64msvcrt ; 6.2
hjwasm32msv13 ; 2.65
AsmC ; 2.5 secs (used to be 2.1...)


In practice, I use AsmC for testing, not only because it's fastest but also because it gives direct feedback, i.e. you can see
Assembling: C:\Masm32\MasmBasic\libtmpAA.asm
Assembling: C:\Masm32\MasmBasic\libtmpAB.asm
Assembling: C:\Masm32\MasmBasic\libtmpAC.asm
Assembling: C:\Masm32\MasmBasic\libtmpAD.asm

while it is assembling. JWasm and ML 6.15 do the same, most others let you wait until everything is complete, which is less nice to watch. But that is a very personal preference, of course 8)

Btw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:

jj2007

Quote from: jj2007 on May 19, 2016, 10:03:10 PMBtw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:

Thanks, Tim :t

2669 ms, 9459844 time(s): address 004386A0 _SymFind                   004386a0 f   symbols.obj
2647 ms, 4850279 time(s): address 00420400 _my_fgets                  00420400 f   input.obj
1383 ms, 10972418 time(s): address 0043A300 _get_id                    0043a300 f   tokenize.obj
1242 ms, 4800861 time(s): address 0043A6A0 _Tokenize                  0043a6a0 f   tokenize.obj
1136 ms, 10357549 time(s): address 00434DF0 _FindResWord               00434df0 f   reswords.obj
1041 ms, 9461863 time(s): address 004384D0 _hashpjw                   004384d0 f   symbols.obj
  921 ms, 16118880 time(s): address 0043A540 _GetToken                  0043a540 f   tokenize.obj

TWell

#39
DELETED

jj2007

#40
3.0 secs, so far the best 64-bit version (AsmC: 2.5 secs).
But 32-bit version is 10% faster:

OxPT_Assembler hjwasm642005DDK ; 3.0
OPT_Assembler hjwasm322005DDK ; 2.7
OxPT_Assembler AsmC ; 2.5 secs


What about _SymFind and _my_fgets? Long and complicated, or is there a chance to give them a boost?

johnsa

Hey,

Do we have any clear indication as to why a C project compiled with an 11 year old version of MSVC is so much faster than if compiled with VS2015 ??
Is it purely down to the CRT inclusions being more bloated/less performant?

TWell

#42
MS MT CRT fault.
Here are cl v19 compiled version with 2003 DDK libc.lib
5.325s        asmc
6.521s        hjwa64-2015clib.exe
7.180s        hjwa32-2015clib.exe

jj2007

Quote from: johnsa on May 23, 2016, 11:04:21 PMwhy a C project compiled with an 11 year old version of MSVC is so much faster than if compiled with VS2015 ??

Compilers develop. We are all running extremely old CPUs, new compilers optimise for the latest CPUs 8)

habran

Hi TWell,
There is a new HJWasm on Terraspace built with your tools,thank you :t
as well as improved source on Github
This one you built above doesn't debug on source level in 32 bit
Cod-Father