Testing Nidud's new .Switch macro and MasmBasic's version (of today, download here (http://masm32.com/board/index.php?topic=94.0)). Some timings on other processors would be nice. No inkey - run from a DOS prompt.
Note the tradeoff between size and speed for the MasmBasic "ret" and "jmp" versions; the latter produces code that is almost identical to AsmC's .Switch macro (I still have to find out where the 8 or so extra bytes come from :( ). What is remarkable is that there is already a speed gain for more than 4 cases, compared to the Masm32 switch macro.
Note also (quote from \Masm32\MasmBasic\MbGuide.rtf but valid also for AsmC .Switch)
- avoid using Switch_ with few cases that are far apart, e.g. case -1000, case 1, case 1000, as this
creates a huge jump table with many default entries; the Masm32 switch macro will be more size-efficient
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with AsmC
11 ms case 260, MB Switch_ ret
8 ms case 260, MB Switch_ jmp
370 ms case 260, Masm32 switch
5 ms case 260, AsmC .Switch
7 ms case 196, MB Switch_ ret
5 ms case 196, MB Switch_ jmp
286 ms case 196, Masm32 switch
5 ms case 196, AsmC .Switch
7 ms case 132, MB Switch_ ret
5 ms case 132, MB Switch_ jmp
189 ms case 132, Masm32 switch
5 ms case 132, AsmC .Switch
7 ms case 68, MB Switch_ ret
5 ms case 68, MB Switch_ jmp
98 ms case 68, Masm32 switch
5 ms case 68, AsmC .Switch
8 ms case 4, MB Switch_ ret
5 ms case 4, MB Switch_ jmp
10 ms case 4, Masm32 switch
5 ms case 4, AsmC .Switch
2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc
MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
Source and exe attached; for building your own version, RichMasm works best, as it creates the AsmUsed$() macro (see above, "Assembled with AsmC"), which allows selective inclusion of Nidud's .Switch based on the assembler used. But the *.asc source opens in Wordpad, too.
P.S.: The macro assembles fine with everything from ML 6.15 .. 10, JWasm, HJWasm and AsmC; the ML versions build about half as fast, though, due to the excessive number of cases tested (300+).
Note also that there is cases=... in line 21; ML chokes above ca. 920 cases, AsmC seems not so impressed, but prepare for fat executables, e.g. for 10,000 cases:
109663 bytes for MBSret
149663 bytes for MBSjmp
179345 bytes for Masm32
149658 bytes for asmc
8)
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Assembled with AsmC
5 ms case 260, MB Switch_ ret
4 ms case 260, MB Switch_ jmp
294 ms case 260, Masm32 switch
4 ms case 260, AsmC .Switch
5 ms case 196, MB Switch_ ret
4 ms case 196, MB Switch_ jmp
226 ms case 196, Masm32 switch
3 ms case 196, AsmC .Switch
5 ms case 132, MB Switch_ ret
4 ms case 132, MB Switch_ jmp
149 ms case 132, Masm32 switch
4 ms case 132, AsmC .Switch
6 ms case 68, MB Switch_ ret
4 ms case 68, MB Switch_ jmp
81 ms case 68, Masm32 switch
3 ms case 68, AsmC .Switch
6 ms case 4, MB Switch_ ret
4 ms case 4, MB Switch_ jmp
4 ms case 4, Masm32 switch
3 ms case 4, AsmC .Switch
2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc
MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
Assembled with AsmC
4 ms case 260, MB Switch_ ret
4 ms case 260, MB Switch_ jmp
313 ms case 260, Masm32 switch
4 ms case 260, AsmC .Switch
4 ms case 196, MB Switch_ ret
3 ms case 196, MB Switch_ jmp
218 ms case 196, Masm32 switch
6 ms case 196, AsmC .Switch
4 ms case 132, MB Switch_ ret
3 ms case 132, MB Switch_ jmp
147 ms case 132, Masm32 switch
7 ms case 132, AsmC .Switch
5 ms case 68, MB Switch_ ret
4 ms case 68, MB Switch_ jmp
77 ms case 68, Masm32 switch
4 ms case 68, AsmC .Switch
5 ms case 4, MB Switch_ ret
6 ms case 4, MB Switch_ jmp
6 ms case 4, Masm32 switch
4 ms case 4, AsmC .Switch
2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc
MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
deleted
deleted
Quote from: nidud on April 24, 2016, 08:38:17 PMIt's a slick solution but impossible to implement given the stack is uneven at the case entry
That could indeed be difficult in C, but the Switch_ macro works just fine, and size-wise it's up to 27% or so less, due to the
retn instead of the
jmp. I'll put a warning in the guide, though, that inside the Case the stack is 4 bytes off. Normally, it should be an issue only for exotic code that picks args manually from the stack.
deleted
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Assembled with AsmC
6 ms case 260, MB Switch_ ret
4 ms case 260, MB Switch_ jmp
334 ms case 260, Masm32 switch
4 ms case 260, AsmC .Switch
6 ms case 196, MB Switch_ ret
4 ms case 196, MB Switch_ jmp
252 ms case 196, Masm32 switch
5 ms case 196, AsmC .Switch
6 ms case 132, MB Switch_ ret
4 ms case 132, MB Switch_ jmp
169 ms case 132, Masm32 switch
4 ms case 132, AsmC .Switch
6 ms case 68, MB Switch_ ret
5 ms case 68, MB Switch_ jmp
89 ms case 68, Masm32 switch
4 ms case 68, AsmC .Switch
6 ms case 4, MB Switch_ ret
5 ms case 4, MB Switch_ jmp
4 ms case 4, Masm32 switch
4 ms case 4, AsmC .Switch
2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc
MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
Quote from: nidud on April 24, 2016, 11:01:25 PM
Switch_ eax
Case_ 1
jmp toend
...
Endsw_
toend:
ret
Not sure what you want to demonstrate here... normally, you should never jmp out of a Switch.
Switch_ eax
Case_ 1
; no jmp toend
... ; will arrive after Endsw_ anyway
Endsw_
If you really want to "jump out", use this:
Switch_ eax
Case_ 1
.if !somecondition
MsgBox "whatever", "title", MB_OK
.endif
Endsw_
deleted
Quote from: nidud on April 25, 2016, 06:58:50 AMHere's a "jump in" from MasmBasic:
Event Key
cmp VKey, -VK_F1
je @Open
...
Case 1, TB0+1 ; 2nd toolbar button or 2nd menu entry
@Open:
No problem, stack is fine.
QuoteHowever what I actually meant was this:
mov ecx,4
Switch_ ecx, ret
Case_ 1
mov eax,1
Case_ 2
mov eax,2
Case_ 3
mov eax,3
Case_ 4
mov eax,4
Endsw_
The last case jumps to end.
GOOD CATCH! Corrected with version 25 April :t
Hi, jj2007!
Intel(R) Pentium(R) CPU G860 @ 3.00GHz
Assembled with AsmC
7 ms case 260, MB Switch_ ret
5 ms case 260, MB Switch_ jmp
410 ms case 260, Masm32 switch
5 ms case 260, AsmC .Switch
7 ms case 196, MB Switch_ ret
5 ms case 196, MB Switch_ jmp
301 ms case 196, Masm32 switch
5 ms case 196, AsmC .Switch
7 ms case 132, MB Switch_ ret
6 ms case 132, MB Switch_ jmp
204 ms case 132, Masm32 switch
5 ms case 132, AsmC .Switch
7 ms case 68, MB Switch_ ret
5 ms case 68, MB Switch_ jmp
109 ms case 68, Masm32 switch
6 ms case 68, AsmC .Switch
8 ms case 4, MB Switch_ ret
6 ms case 4, MB Switch_ jmp
11 ms case 4, Masm32 switch
5 ms case 4, AsmC .Switch
2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc
MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
Thanks to all of you :icon14:
The problem is to find an application for jump table switches. My editor's help menu, for example, has over 30 entries, so that would be a candidate; but the number of entries can vary, so this is not an option.
WndProc?
WM_NULL equ 0h
WM_CREATE equ 1h
WM_DESTROY equ 2h
WM_MOVE equ 3h
WM_SIZE equ 5h
WM_ACTIVATE equ 6h
Looks like a great candidate, but...
WM_CUT equ 300h
WM_COPY equ 301h
WM_PASTE equ 302h
No good for a jump table ::)
deleted
This is a test with 2,000 cases:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Result MB: 0, 26 ms
Result Masm32: 0, 1795 ms
Result AsmC: 0, 29 ms
14084 bytes for swMB
27540 bytes for swMasm32
22111 bytes for swAsmC
But if you increase to 10,000 cases, the timings for Masm32 switch jump up dramatically:
Result MB: 0, 157 ms
Result Masm32: 0, 202547 ms
Result AsmC: 0, 185 ms
I wonder if AMD cpus behave differently...
Each has its advantage, try a table based sequence of 10 1000 1000000 100000000 and you will see why a normal sequential comparison switch is more general purpose. Table based switch statements must work on stored addresses and with wide disparity of value that tables become massive.
AMD Athlon(tm) II X2 220 Processor 2.8 Ghz
Assembled with AsmC
Result MB: 0, 25 ms
Result Masm32: 0, 2708 ms
Result AsmC: 0, 18 ms
14084 bytes for swMB
27540 bytes for swMasm32
22111 bytes for swAsmC
Nice "deep" article on switch: From Switch Statement Down to Machine Code (http://lazarenko.me/switch/) by Vlad Lazarenko (credits to Frankie at Pelles C (http://forum.pellesc.de/index.php?topic=7160.msg27165#msg27165)).
:biggrin:
Try a "switch" statement with a number range like "0, 2gig, 4gig" and see which algo is faster. :P