Hello everyone,
there is a new HJWasm release on the Terraspace (http://www.terraspace.co.uk/hjwasm.html) with some sophisticated improvements 8)
Built in two new options for the .SWITCH block:
option SWITCHSTYLE : ASMSTYLE (default)
option SWITCHSTYLE: CSTYLE (optional)
hutch, I hope you will be happy with this one, now you have ASMSTYLE .SWITCH block, thank you for your suggestion :t
Multiple cases can be used with the comma ',' and can be continued in new line if they don't fit in one
If you don't need .DEFAULT it can be omitted in both ASMSTYLE or CSTYLE
This version also reduces memory consumption in multiple cases, it will create only one jump table for all cases in both styles.
You can switch to another style and go back to first one as many time as you want using
option SWITCHSTYLEhere are some examples:
Quotemov eax, 184h
.switch eax
.case 179h,180h,1c5h,17bh,17dh,
182h,184h,185h
mov edx,1d5h
.case 1d3h
mov edx, 1d3h
.case 1f4h
mov edx, 1f4h
.case 200h
mov edx, 200h
.case 201h
mov edx, 201h
.case 202h
mov edx, 202h
.case 203h
mov edx, 203h
.default
mov edx, 0
.endswitch
option SWITCHSTYLE: CSTYLE
mov eax, 184h
.switch eax
.case 179h, 180h, 1c5h, 17bh, 17dh,
182h, 184h, 185h
mov edx, 1d5h
.break
.case 1d3h
mov edx, 1d3h
.break
.case 1f4h
mov edx, 1f4h
.break
.case 200h
mov edx, 200h
.break
.case 201h
mov edx, 201h
.break
.case 202h
mov edx, 202h
.break
.case 203h
mov edx, 203h
.break
.default
mov edx, 0
.break
.endswitch
option SWITCHSTYLE : ASMSTYLE
.switch bl
.case 202
mov edx, 202
.case 203
mov edx, 203
.case 2013
mov edx, 213
.endswitch
Well done :t
With my 800 lines testbed, it's much faster than ML and considerably faster than Japheth's last JWasm version:
OxPT_Assembler mlv615 ; 44.0kB, 1070 ms
OxPT_Assembler mlv10 ; 44.0kB, 1070 ms
OxPT_Assembler JWasm ; 44.0kB, 650 ms
OxPT_Assembler HJWasm32 ; 44.0kB, 580 ms
OxPT_Assembler HJWasm64 ; 44.0kB, 580 ms
OxPT_Assembler asmc ; 44.0kB, 480 ms
Note that there is no measurable speed difference between the 32-bit and 64-bit versions. The latter is 63% fatter, though ;)
Same pattern with the RichMasm source (17k lines):
OxPT_Assembler mlv10 ; 1200 ms
OxPT_Assembler mlv615 ; 1200 ms
OxPT_Assembler JWasm ; 880 ms
OxPT_Assembler HJWasm32 ; 820 ms
OPT_Assembler HJWasm64 ; 820 ms
OxPT_Assembler AsmC ; 740 ms
Now the surprise when building the MasmBasic library (28k lines):
OPT_Assembler mlv615 ; 6.9 secs
OxPT_Assembler JWasm ; 5.5 secs
OxPT_Assembler HJWasm32 ; 2.8
OxPT_Assembler HJWasm64 ; 4.2 secs
OxPT_Assembler AsmC ; 2.1 secs
The 32-bit version is consistently 50% faster 8)
Thanks jj2007 :biggrin:
HJWasm32 has to deal with less code so it can be the reason,
but I suspect that your machine is more experienced in running 32 bit and hence the speed ;)
I have here example how much has the .SWITCH block being improved:
Quote
This code:
mov eax, 1c5h
.switch eax
.case 179h,17bh,17dh,182h,184h,187h,18bh,191h,198h,
1a0h,1a2h,1a4h,1a7h,1ach,1afh,1b3h,1b5h,1b8h,
1bch,1c5h,1c8h,1cbh,1cdh,1cfh,1d1h,1d3h,1d5h,
1d7h,1d9h,1dbh,1f2h,1f4h,200h
mov edx, 200h
.case 201h
mov edx, 201h
.case 202h
mov edx, 202h
.case 203h
mov edx, 203h
.default
mov edx, 0
.endswitch
Now it makes this:
?_021 LABEL NEAR
sub rsp, 472 ; 40001165 _ 48: 81. EC, 000001D8
mov eax, 453 ; 4000116C _ B8, 000001C5
jmp ?_023 ; 40001171 _ EB, 32
; Note: No jump seems to point here
mov edx, 512 ; 40001173 _ BA, 00000200
jmp ?_026 ; 40001178 _ E9, 000000FF
; Note: No jump seems to point here
mov edx, 513 ; 4000117D _ BA, 00000201
jmp ?_026 ; 40001182 _ E9, 000000F5
; Note: No jump seems to point here
mov edx, 514 ; 40001187 _ BA, 00000202
jmp ?_026 ; 4000118C _ E9, 000000EB
; Note: No jump seems to point here
mov edx, 515 ; 40001191 _ BA, 00000203
jmp ?_026 ; 40001196 _ E9, 000000E1
?_022: mov edx, 0 ; 4000119B _ BA, 00000000
jmp ?_026 ; 400011A0 _ E9, 000000D7
?_023: cmp eax, 515 ; 400011A5 _ 3D, 00000203
ja ?_022 ; 400011AA _ 77, EF
sub eax, 377 ; 400011AC _ 2D, 00000179
jc ?_022 ; 400011B1 _ 72, E8
lea rdx, ptr [?_025] ; 400011B3 _ 48: 8D. 15, 00000037(rel)
movzx rax, byte ptr [rax+rdx] ; 400011BA _ 48: 0F B6. 04 10
lea rdx, ptr [?_024] ; 400011BF _ 48: 8D. 15, 00000003(rel)
jmp qword ptr [rdx+rax*8] ; 400011C6 _ FF. 24 C2
?_024 label qword ; switch/case jump table
dq Unnamed_80000000_0 ; 400011C9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011D1 _ 000000014000117D (d)
dq Unnamed_80000000_0 ; 400011D9 _ 0000000140001187 (d)
dq Unnamed_80000000_0 ; 400011E1 _ 0000000140001191 (d)
dq Unnamed_80000000_0 ; 400011E9 _ 000000014000119B (d)
[?_025] label byte
db 00 04 00 04 00 04 04 04 04 00 04 00 04 04 00 04 04 04 00 04 04 04
db 04 04 00 04 04 04 04 04 04 00 04 04 04 04 04 04 04 00 04 00 04 00
db 04 04 00 04 04 04 04 00 04 04 00 04 04 04 00 04 00 04 04 00 04 04
db 04 00 04 04 04 04 04 04 04 04 00 04 04 00 04 04 00 04 00 04 00 04
db 00 04 00 04 00 04 00 04 00 04 00 04 04 04 04 04 04 04 04 04 04 04
db 04 04 04 04 04 04 04 04 04 04 04 00 04 00 04 04 04 04 04 04 04 04
db 04 04 04 00 01 02 03 b8 44 00 00 00 eb 66 e8 10 fe ff ff e9 aa 00
Before:
?_021 LABEL NEAR
sub rsp, 472 ; 40001165 _ 48: 81. EC, 000001D8
mov eax, 453 ; 4000116C _ B8, 000001C5
jmp ?_023 ; 40001171 _ EB, 32
; Note: No jump seems to point here
mov edx, 512 ; 40001173 _ BA, 00000200
jmp ?_026 ; 40001178 _ E9, 000001FF
; Note: No jump seems to point here
mov edx, 513 ; 4000117D _ BA, 00000201
jmp ?_026 ; 40001182 _ E9, 000001F5
; Note: No jump seems to point here
mov edx, 514 ; 40001187 _ BA, 00000202
jmp ?_026 ; 4000118C _ E9, 000001EB
; Note: No jump seems to point here
mov edx, 515 ; 40001191 _ BA, 00000203
jmp ?_026 ; 40001196 _ E9, 000001E1
?_022: mov edx, 0 ; 4000119B _ BA, 00000000
jmp ?_026 ; 400011A0 _ E9, 000001D7
?_023: cmp eax, 515 ; 400011A5 _ 3D, 00000203
ja ?_022 ; 400011AA _ 77, EF
sub eax, 377 ; 400011AC _ 2D, 00000179
jc ?_022 ; 400011B1 _ 72, E8
lea rdx, ptr [?_025] ; 400011B3 _ 48: 8D. 15, 00000137(rel)
movzx rax, byte ptr [rax+rdx] ; 400011BA _ 48: 0F B6. 04 10
lea rdx, ptr [?_024] ; 400011BF _ 48: 8D. 15, 00000003(rel)
jmp qword ptr [rdx+rax*8] ; 400011C6 _ FF. 24 C2
?_024 label qword ; switch/case jump table
dq Unnamed_80000000_0 ; 400011C9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011D1 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011D9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011E1 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011E9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011F1 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400011F9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001201 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001209 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001211 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001219 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001221 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001229 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001231 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001239 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001241 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001249 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001251 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001259 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001261 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001269 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001271 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001279 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001281 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001289 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001291 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 40001299 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012A1 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012A9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012B1 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012B9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012C1 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012C9 _ 0000000140001173 (d)
dq Unnamed_80000000_0 ; 400012D1 _ 000000014000117D (d)
dq Unnamed_80000000_0 ; 400012D9 _ 0000000140001187 (d)
dq Unnamed_80000000_0 ; 400012E1 _ 0000000140001191 (d)
dq Unnamed_80000000_0 ; 400012E9 _ 000000014000119B (d)
db 00 24 01 24 02 24 24 24 24 03 24 04 24 24 05 24 24 24 06 24 24 24
db 24 24 07 24 24 24 24 24 24 08 24 24 24 24 24 24 24 09 24 0a 24 0b
db 24 24 0c 24 24 24 24 0d 24 24 0e 24 24 24 0f 24 10 24 24 11 24 24
db 24 12 24 24 24 24 24 24 24 24 13 24 24 14 24 24 15 24 16 24 17 24
db 18 24 19 24 1a 24 1b 24 1c 24 1d 24 24 24 24 24 24 24 24 24 24 24
db 24 24 24 24 24 24 24 24 24 24 24 1e 24 1f 24 24 24 24 24 24 24 24
db 24 24 24 20 21 22 23 b8 44
Quote from: habran on May 17, 2016, 01:58:39 PM
Thanks jj2007 :biggrin:
HJWasm32 has to deal with less code so it can be the reason,
but I suspect that your machine is more experienced in running 32 bit and hence the speed ;)
I have here example how much has the .SWITCH block being improved:
I would argue about the "more code" logic if my 27k lines of code consisted significantly of .switch structures, but for compatibility reasons I am still using good ol' Switch_ (http://masm32.com/board/index.php?topic=94.msg57249#msg57249) (remind me to set up a speed & size comparison between Switch_ and .switch ...)
But the "more experienced in running 32 bit" argument is certainly logical :lol:
@jj2007
here is PellesC 8 x64 version for speed test.
@TWell: library build exactly the same as HJWasm32, RichMasm a bit faster:
OxPT_Assembler JWasm ; 880 ms
OxPT_Assembler HJWasm32 ; 820 ms
OxPT_Assembler HJWasm64 ; 820 ms
OxPT_Assembler HJwasm64poc ; 770 ms
jj2007,
Test this one please
Quote from: habran on May 18, 2016, 06:15:47 AM
jj2007,
Test this one please
As fast as the 32-bit version with the RichMasm source,
but with the library, the 32-bit version is exactly 50% faster.
My test with m32lib *.asm files
HJWasm64.exe -c -coff -q \masm32\m32lib\*.asm
HJWasm32.exe 16.982s
HJWasmGcc.exe 16.712s
HJWasm64poc.exe 21.224s
HJWasm.exe 29.547s
HJWasm64.exe 30.14s
This is odd :icon_confused:
OK, last one was built with VS15 with full optimization
this one is built with GCC via C:B
let see which one is faster
This one, MB library:
last one 2.77 secs (version 504866 bytes, 18 May)
32-bit 3.15 secs (version 361472 bytes, 16 May)
VS15 has fantastic optimisations :eusa_boohoo:
Btw the last one loads remarkably well with OllyDbg, a well-known 32-bit debugger 8)
Does that mean that GCC is the best version?
32-bit gcc version is fast.
m32lib again with another PC.
HJWasmGcc.exe 7.368s 7.659s
HJWasm32.exe 9.153s 9.729s
HJWasm64poc.exe 10.626s 10.664s
HJWasm64Gcc.exe 16.941s 17.214s
64-bit not so fast, compiled with v 5 with option -O2
@SET SRC=..\HJWasm-master
@SET CC=gcc -c -O2 -m64 -I..\HJWasm-master\H -DWIN64=1
@REM hjwasm64gcc.exe:
for %%c in (%SRC%\*.c) do %CC% %%c
gcc -s main.o apiemu.o assemble.o assume.o atofloat.o backptch.o bin.o branch.o cmdline.o codegen.o coff.o condasm.o context.o cpumodel.o data.o dbgcv.o directiv.o elf.o end.o equate.o errmsg.o expans.o expreval.o extern.o fastpass.o fixup.o fpfixup.o hll.o input.o invoke.o label.o linnum.o listing.o loop.o lqueue.o macro.o mangle.o memalloc.o msgtext.o omf.o omffixup.o omfint.o option.o parser.o posndir.o preproc.o proc.o queue.o reswords.o safeseh.o segment.o simsegm.o string.o symbols.o tbyte.o tokenize.o types.o -o hjwasm64gcc.exe
@DEL *.o
Is in my test something wrong :icon_confused:
I have built mine GCC with -O3 and you can see a reduction in size
are you saying that -O2 is producing faster code than -O3?
I can also see that Pelle's C is producing even less code than GCC
Are you sure that HJWasmPoc.64 is fastest 64 bit?
Let show only 64 bit speed with all versions
Quote from: habran on May 18, 2016, 05:01:55 PMLet show only 64 bit speed with all versions
OK, but don't distort the competition by smuggling in (Reply #9) 32-bit versions called HJWasm64 :eusa_naughty:
GCC yes, but no doping please :t
I was not aware of that :icon_eek:
Here it is:
mingw32-gcc.exe -O3 -DWIN64 -DNDEBUG -I.\H -I"C:\Program Files (x86)\mingw-w64\i686-4.9.2-posix-dwarf-rt_v3-rev0\mingw32\i686-w64-mingw32\bin" -I"C:\Program Files (x86)\CodeBlocks\MinGW\bin" -c "C:\Users\Brane
-m64 isn't in commandline, so resut was 32-bit.
It seems you picked the right options this time - almost as fast as AsmC :t
OxPT_Assembler HJWasm32 ; 3.15
OxPT_Assembler HJWasm64 ; 2.80 secs
Btw where are your bottlenecks? Is loading string arrays and finding matches in these strings one of them?
- the FAST option is typically about twice as fast as CRT strstr, but 3..4 times as fast when used with
MasmBasic string arrays (Intel Core i5 timings for counting a rare word in a file with 800 MB, 6 Mio lines):
232 ms for fast Instr_
795 ms for "normal" Instr_
999 ms for Masm32 InString
929 ms for CRT strstr
;)
Quote from: TWell on May 18, 2016, 05:38:08 PM
-m64 isn't in commandline, so resut was 32-bit.
Oops, you are right -
doping alarm :dazzled:
Btw you could check if you have an old GCC version. Google finds plenty of hits for
gcc 64-bit slower than 32-bit, many of them around spring 2014. So maybe the developers have saved the honour of 64-bit compilers in the meantime 8)
OK, let see this one, is it x64 ::)
Quote from: TWell on May 18, 2016, 04:25:26 PMIs in my test something wrong :icon_confused:
OPT_Assembler hjwasm32gcc3 ; 2.7
OxPT_Assembler hjwasm64gcc ; 6.1
OPT_Assembler AsmC ; 2.1 secs
::)
Quote from: habran on May 18, 2016, 07:34:55 PM
OK, let see this one, is it x64 ::)
2.55 secs, so far your best one :t
(one little problem: Olly says it's 32-bit code...)
deleted
Quote from: nidud on May 18, 2016, 09:02:29 PMThe switch still trashes registers
That seems not the only problem. Here is a snippet - try with the latest AsmC and HJWasm versions 8)
include \masm32\include\masm32rt.inc
.code
start:
m2m edi, -5
.Repeat
print chr$(13, 10)
print str$(edi), 9
.switch edi
.case -4
print "case -4 "
; .break
.case -2
print "case -2 "
; .break
.case 0
print "case 0 "
.break
.case 2
print "case +2 "
.break
.case 4
print "case +4 "
.break
.Default
print "default"
.Endsw
inc edi
.Until sdword ptr edi>5
inkey chr$(13, 10, "--- ok? ---")
exit
end start
deleted
With AsmC, no options but .break instead, the .break causes an exit of the .Repeat loop. Is that "by design"?
Hi nidud,
This code below is a brilliant idea, and I like it a lot, however, it is not applicable in 64 bit because it would force linker LARGEADDRESS error, while I was writing the code my mind was focused on 64 bit and 32 bit was just conversion from 64 bit, now I see that I have to start to think from 32 bit side if I write 32 bit code
cmp eax,min
jl endsw
cmp eax,max
jg endsw
push eax
movzx eax,index[eax-min]
mov eax,table[eax*4]
xchg eax,[esp]
retn
In the case of jj2007 source, I don't think that any ASM programmer would write such a code that would trash registers that are needed for next iteration.
You can, of course, push register on the stack, and then purposely overwrite that memory on the stack, and than complain that the assembler is not good because it let you overwrite the stack space ::)
If that happened I would not fill sorry for that "programmer"
deleted
Quote from: habran on May 18, 2016, 11:35:35 PMIn the case of jj2007 source, I don't think that any ASM programmer would write such a code that would trash registers that are needed for next iteration ... would not fill sorry for that "programmer"
So far, I have tried to be helpful. Show me one occasion where I insulted you as a "programmer" or similar.
Besides, this is obviously valid code, since edi is a non-volatile register. I perfectly understand why you are pissed off, but please concentrate on your homework instead of attacking others.
My intention was not to attack you jj2007, I am grateful for your help and cooperation, and I already said before how much I appreciate you as a programmer, however, I am sure that you would never write this construction in your programs, and you have to admit that :biggrin:
"programmer" was not pointed to you but to someone who would write some program to delete his stack
deleted
Quote from: habran on May 19, 2016, 12:21:33 AMI am sure that you would never write this construction in your programs, and you have to admit that :biggrin:
OK, let's declare it a misunderstanding. But now I am curious: Where in my code do I trash the stack that I need later on?
Quote from: nidud on May 19, 2016, 12:07:59 AMAh, finally: Did you RTFM :lol:
No, I was busy reading the rest of the Internet 8)
Anyway, latest results from my switch testbed:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
24 ms case 260, MB Switch_ table
230 ms case 260, MB Switch_ chain
455 ms case 260, Masm32 switch
31 ms case 260, HJWasm .Switch
6 ms case 260, AsmC .Switch
24 ms case 196, MB Switch_ table
178 ms case 196, MB Switch_ chain
341 ms case 196, Masm32 switch
38 ms case 196, HJWasm .Switch
6 ms case 196, AsmC .Switch
23 ms case 132, MB Switch_ table
127 ms case 132, MB Switch_ chain
229 ms case 132, Masm32 switch
22 ms case 132, HJWasm .Switch
6 ms case 132, AsmC .Switch
23 ms case 68, MB Switch_ table
76 ms case 68, MB Switch_ chain
120 ms case 68, Masm32 switch
40 ms case 68, HJWasm .Switch
7 ms case 68, AsmC .Switch
23 ms case 4, MB Switch_ table
24 ms case 4, MB Switch_ chain
6 ms case 4, Masm32 switch
38 ms case 4, HJWasm .Switch
6 ms case 4, AsmC .Switch
2989 bytes for MbTable
4840 bytes for MbChain
4799 bytes for Masm32
6978 bytes for hjwasm
4208 bytes for asmc
The last AsmC row was added "by hand" because obviously you can't assemble the source with both assemblers at the same time. If you want to build it yourself, open the source in RichMasm and press Ctrl End. The OPT_Assembler rows should speak for themselves. OxPT is a disabled one (RichMasm looks for a case-sensitive
OPT_). If two options are active, the last one is valid (I know, I know, modern IDEs have somewhere a project options menu where you can set the assembler if you find the right menu item; RM is very old fashioned, sorry).
DELETED
64-bit version is faster:
OxPT_Assembler hjwasm32msvcrt ; 7.6
OxPT_Assembler hjwasm64msvcrt ; 6.2
OxPT_Assembler AsmC ; 2.5 secs
DELETED
2.66 secs :t
Here are all my current timings:
HJWasmTWell ; 7.8 secs (9 May)
mlv10 ; 7.8 secs
mlv615 ; 7.0 secs - use for release version
JWasm ; 5.5 secs
HJWasm32 ; 3.15
HJWasm64 ; 2.80 secs
HJwasm64poc ; 2.75
hjwasm32gcc3 ; 2.7
hjwasm64gcc ; 6.1
HJWasm64Habran ; 2.55 secs, but it's 32-bit code
hjwasm32msvcrt ; 7.6
hjwasm64msvcrt ; 6.2
hjwasm32msv13 ; 2.65
AsmC ; 2.5 secs (used to be 2.1...)
In practice, I use AsmC for testing, not only because it's fastest but also because it gives direct feedback, i.e. you can see
Assembling: C:\Masm32\MasmBasic\libtmpAA.asm
Assembling: C:\Masm32\MasmBasic\libtmpAB.asm
Assembling: C:\Masm32\MasmBasic\libtmpAC.asm
Assembling: C:\Masm32\MasmBasic\libtmpAD.asm
while it is assembling. JWasm and ML 6.15 do the same, most others let you wait until everything is complete, which is less nice to watch. But that is a very personal preference, of course 8)
Btw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:
Quote from: jj2007 on May 19, 2016, 10:03:10 PMBtw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:
Thanks, Tim :t
2669 ms, 9459844 time(s): address 004386A0 _SymFind 004386a0 f symbols.obj
2647 ms, 4850279 time(s): address 00420400 _my_fgets 00420400 f input.obj
1383 ms, 10972418 time(s): address 0043A300 _get_id 0043a300 f tokenize.obj
1242 ms, 4800861 time(s): address 0043A6A0 _Tokenize 0043a6a0 f tokenize.obj
1136 ms, 10357549 time(s): address 00434DF0 _FindResWord 00434df0 f reswords.obj
1041 ms, 9461863 time(s): address 004384D0 _hashpjw 004384d0 f symbols.obj
921 ms, 16118880 time(s): address 0043A540 _GetToken 0043a540 f tokenize.obj
DELETED
3.0 secs, so far the best 64-bit version (AsmC: 2.5 secs).
But 32-bit version is 10% faster:
OxPT_Assembler hjwasm642005DDK ; 3.0
OPT_Assembler hjwasm322005DDK ; 2.7
OxPT_Assembler AsmC ; 2.5 secs
What about _SymFind and _my_fgets? Long and complicated, or is there a chance to give them a boost?
Hey,
Do we have any clear indication as to why a C project compiled with an 11 year old version of MSVC is so much faster than if compiled with VS2015 ??
Is it purely down to the CRT inclusions being more bloated/less performant?
MS MT CRT fault.
Here are cl v19 compiled version with 2003 DDK libc.lib
5.325s asmc
6.521s hjwa64-2015clib.exe
7.180s hjwa32-2015clib.exe
Quote from: johnsa on May 23, 2016, 11:04:21 PMwhy a C project compiled with an 11 year old version of MSVC is so much faster than if compiled with VS2015 ??
Compilers develop. We are all running extremely old CPUs, new compilers optimise for the latest CPUs 8)
Hi TWell,
There is a new HJWasm on Terraspace built with your tools,thank you :t
as well as improved source on Github
This one you built above doesn't debug on source level in 32 bit
Sorry JJ, I was busy these days to fix HJWasm :biggrin:
Try it now please, it is updated on Terraspace 8)
Quote from: habran on May 24, 2016, 12:25:47 AM
This one you built above doesn't debug on source level in 32 bit
That AVX was missing too :redface:
I was testing m32lib compile only as there is a lot file access.
PS: have anyone old Vista DDK ? Is there libc.lib (VS 2005?)
Quote from: habran on May 24, 2016, 12:33:47 AM
Sorry JJ, I was busy these days to fix HJWasm :biggrin:
Try it now please, it is updated on Terraspace 8)
OxPT_Assembler HJWasm32 ; 2.7
OxPT_Assembler HJWasm64 ; 3.0 secs, and yes, it's 64-bit code
OxPT_Assembler AsmC ; 2.5 secs
:t
Thanks JJ :t
So, taking in consideration that ASMC has most important parts translated to asm, and has less code to run, that is amazing speed.
With HJWasm64 you are testing 32 bit code optimized for 32 bit, that is why it is slower than HJWasm32,
we should try opposite. I am planing to write some code optimized for x64 and than we will see how it will perform 8)
From VC6 samples:
# When building single-threaded applications you can link your executable
# with either LIBC, LIBCMT, or CRTDLL, although LIBC will provide the best
# performance.
AFAIK LIBCMT, is for static build and LIBC for dynamic build>
Where is that LIBC.LIB which you gave me the link for?
Isn't it from msvc2005?
libc.lib is a static library.
libc.lib can found from Windows NT 5.2 DDK (Server 2003 SP1) and x64 from 5.2.3790.2075.51.PlatformSDK_Svr2003R2_rtm too.
I can't inspect Vista DDK :(
Hi TWell,
The one you build the last is working fine but I can't test for the speed
I have fixed some minor errors in hll.c and added a new feature to the .SWITCH block, so we will upgrade tonight or tomorrow.
If you give me your email address I will send you new hll.c for testing
With VisualCppBuildTools2015
5.418s asmc
8.596s HJWasm64.exe
12.372s hjwa64-2015.exe
Not bad eh??
VS2015 sucks :(
I prefer this one:
OxPT_Assembler HJWasm32 ; 2.7
OxPT_Assembler HJWasm64 ; 3.0 secs, and yes, it's 64-bit code
OxPT_Assembler AsmC ; 2.5 secs
:t
Quote from: jj2007 on May 19, 2016, 10:03:10 PMIn practice, I use AsmC for testing, not only because it's fastest but also because it gives direct feedback, i.e. you can see
Assembling: C:\Masm32\MasmBasic\libtmpAA.asm
Assembling: C:\Masm32\MasmBasic\libtmpAB.asm
Assembling: C:\Masm32\MasmBasic\libtmpAC.asm
Assembling: C:\Masm32\MasmBasic\libtmpAD.asm
while it is assembling. JWasm and ML 6.15 do the same, most others let you wait until everything is complete, which is less nice to watch. But that is a very personal preference, of course 8)
Btw it would be nice if Nidud or Habran or both could identify the innermost loop that makes the assembly slow. We are experts here in speeding up C code... :badgrin:
Re "others let you wait until everything is complete": would fflush(..) after each module help?
New HJWasm uploaded on Terraspace (http://www.terraspace.co.uk/hjwasm.html) with some bug fixes and hopefully some speed improvement for the .SWITCH block hll
Timings for building the MB library are unchanged.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
20 ms case 260, MB Switch_ table
197 ms case 260, MB Switch_ chain
370 ms case 260, Masm32 switch
46 ms case 260, HJWasm .Switch
19 ms case 196, MB Switch_ table
146 ms case 196, MB Switch_ chain
277 ms case 196, Masm32 switch
46 ms case 196, HJWasm .Switch
19 ms case 132, MB Switch_ table
104 ms case 132, MB Switch_ chain
186 ms case 132, Masm32 switch
46 ms case 132, HJWasm .Switch
19 ms case 68, MB Switch_ table
62 ms case 68, MB Switch_ chain
98 ms case 68, Masm32 switch
46 ms case 68, HJWasm .Switch
19 ms case 4, MB Switch_ table
20 ms case 4, MB Switch_ chain
5 ms case 4, Masm32 switch
46 ms case 4, HJWasm .Switch
2989 bytes for MbTable
4840 bytes for MbChain
4799 bytes for Masm32
5729 bytes for hjwasm
Thanks JJ,
It looks like we have at least stable speed, considering it is written in C language it is pretty good.
It is probably possible to make it little bit faster with some more optimization.
Hi JJ,
Can you please test this build with the same sources you did with the last one to see if there is the difference in speed?
Aren't the timings JJ provided run-time execution of the switch rather than compile-time related?
Yes, that is what I want to be tested.
I think that maybe I have faster sorting routine in this one.
Quote from: habran on June 03, 2016, 09:33:15 PMCan you please test this build with the same sources you did with the last one to see if there is the difference in speed?
Hi Habran,
Build speed for MasmBasic library is very good, only 10% slower now than AsmC :t
Would be nice to flush the console after each assembly, though.
Quote from: johnsa on June 03, 2016, 10:34:51 PM
Aren't the timings JJ provided run-time execution of the switch rather than compile-time related?
Here is run-time execution of the switch:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with AsmC
20 ms case 260, MB Switch_ table
193 ms case 260, MB Switch_ chain
374 ms case 260, Masm32 switch
5 ms case 260, AsmC .Switch
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with HJWasm32
20 ms case 260, MB Switch_ table
190 ms case 260, MB Switch_ chain
374 ms case 260, Masm32 switch
48 ms case 260, HJWasm .Switch
And the size of the generated switch codes:
2989 bytes for MbTable
4840 bytes for MbChain
4799 bytes for Masm32
4201 bytes for asmc
5729 bytes for hjwasm
The AsmC .Switch is clearly fastest, while the MasmBasic Switch_ macro generates the smallest code for high case numbers (where it auto-selects the compact table version).
Thanks JJ, it looks like former one is 2 ms faster :(
So I downloaded HJWasm - and lo and behold - when I tried to email it to myself
Google (Gmail scanner) says it contains malware.....seeesh
So I know Habran would never include such - but what file could be the cause of the false positive ?
There is definitely no malware in the archives.. I would assume its heuristic scanning possibly picks up that the exe generates code which it doesn't like (possibly based on an unfamiliar name or origin) or alternatively the fact that the archive contains asm files..
Is there anyway to get a more detailed report from the scan as to exactly what it's not happy with?
If you are in doubt, upload the file to Jotti: nothing for HJWasm32 (https://virusscan.jotti.org/en-US/filescanjob/auaptrk7ij). As johnsa wrote, the AV scanners use heuristics, and those are a PITA (Pain In The A**). As soon as they see something that was apparently not built with MSVC or GCC, they make racist remarks about assembler code being dangerous etc 8)
Btw we have a dedicated sub-forum for that: AV Software sh*t list (http://masm32.com/board/index.php?board=23.0)
New HJWasm uploaded on Terraspace with some more bug fixes in the .SWITCH block.