The MASM Forum

General => The Laboratory => Topic started by: jj2007 on April 24, 2016, 07:34:02 PM

Title: Switch macro timings
Post by: jj2007 on April 24, 2016, 07:34:02 PM
Testing Nidud's new .Switch macro and MasmBasic's version (of today, download here (http://masm32.com/board/index.php?topic=94.0)). Some timings on other processors would be nice. No inkey - run from a DOS prompt.

Note the tradeoff between size and speed for the MasmBasic "ret" and "jmp" versions; the latter produces code that is almost identical to AsmC's .Switch macro (I still have to find out where the 8 or so extra bytes come from  :( ). What is remarkable is that there is already a speed gain for more than 4 cases, compared to the Masm32 switch macro.

Note also (quote from \Masm32\MasmBasic\MbGuide.rtf but valid also for AsmC .Switch)
- avoid using Switch_ with few cases that are far apart, e.g. case -1000, case 1, case 1000, as this
  creates a huge jump table with many default entries; the Masm32 switch macro will be more size-efficient


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with AsmC
11 ms   case 260, MB Switch_ ret
8 ms    case 260, MB Switch_ jmp
370 ms  case 260, Masm32 switch
5 ms    case 260, AsmC .Switch

7 ms    case 196, MB Switch_ ret
5 ms    case 196, MB Switch_ jmp
286 ms  case 196, Masm32 switch
5 ms    case 196, AsmC .Switch

7 ms    case 132, MB Switch_ ret
5 ms    case 132, MB Switch_ jmp
189 ms  case 132, Masm32 switch
5 ms    case 132, AsmC .Switch

7 ms    case 68, MB Switch_ ret
5 ms    case 68, MB Switch_ jmp
98 ms   case 68, Masm32 switch
5 ms    case 68, AsmC .Switch

8 ms    case 4, MB Switch_ ret
5 ms    case 4, MB Switch_ jmp
10 ms   case 4, Masm32 switch
5 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7


Source and exe attached; for building your own version, RichMasm works best, as it creates the AsmUsed$() macro (see above, "Assembled with AsmC"), which allows selective inclusion of Nidud's .Switch based on the assembler used. But the *.asc source opens in Wordpad, too.

P.S.: The macro assembles fine with everything from ML 6.15 .. 10, JWasm, HJWasm and AsmC; the ML versions build about half as fast, though, due to the excessive number of cases tested (300+).

Note also that there is cases=... in line 21; ML chokes above ca. 920 cases, AsmC seems not so impressed, but prepare for fat executables, e.g. for 10,000 cases:
109663  bytes for MBSret
149663  bytes for MBSjmp
179345  bytes for Masm32
149658  bytes for asmc
8)
Title: Re: Switch macro timings
Post by: sinsi on April 24, 2016, 08:04:20 PM
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Assembled with AsmC
5 ms    case 260, MB Switch_ ret
4 ms    case 260, MB Switch_ jmp
294 ms  case 260, Masm32 switch
4 ms    case 260, AsmC .Switch

5 ms    case 196, MB Switch_ ret
4 ms    case 196, MB Switch_ jmp
226 ms  case 196, Masm32 switch
3 ms    case 196, AsmC .Switch

5 ms    case 132, MB Switch_ ret
4 ms    case 132, MB Switch_ jmp
149 ms  case 132, Masm32 switch
4 ms    case 132, AsmC .Switch

6 ms    case 68, MB Switch_ ret
4 ms    case 68, MB Switch_ jmp
81 ms   case 68, Masm32 switch
3 ms    case 68, AsmC .Switch

6 ms    case 4, MB Switch_ ret
4 ms    case 4, MB Switch_ jmp
4 ms    case 4, Masm32 switch
3 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7
Title: Re: Switch macro timings
Post by: sinsi on April 24, 2016, 08:07:59 PM
AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
Assembled with AsmC
4 ms    case 260, MB Switch_ ret
4 ms    case 260, MB Switch_ jmp
313 ms  case 260, Masm32 switch
4 ms    case 260, AsmC .Switch

4 ms    case 196, MB Switch_ ret
3 ms    case 196, MB Switch_ jmp
218 ms  case 196, Masm32 switch
6 ms    case 196, AsmC .Switch

4 ms    case 132, MB Switch_ ret
3 ms    case 132, MB Switch_ jmp
147 ms  case 132, Masm32 switch
7 ms    case 132, AsmC .Switch

5 ms    case 68, MB Switch_ ret
4 ms    case 68, MB Switch_ jmp
77 ms   case 68, Masm32 switch
4 ms    case 68, AsmC .Switch

5 ms    case 4, MB Switch_ ret
6 ms    case 4, MB Switch_ jmp
6 ms    case 4, Masm32 switch
4 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

Title: Re: Switch macro timings
Post by: nidud on April 24, 2016, 08:38:17 PM
deleted
Title: Re: Switch macro timings
Post by: nidud on April 24, 2016, 08:45:03 PM
deleted
Title: Re: Switch macro timings
Post by: jj2007 on April 24, 2016, 10:26:21 PM
Quote from: nidud on April 24, 2016, 08:38:17 PMIt's a slick solution but impossible to implement given the stack is uneven at the case entry

That could indeed be difficult in C, but the Switch_ macro works just fine, and size-wise it's up to 27% or so less, due to the retn instead of the jmp. I'll put a warning in the guide, though, that inside the Case the stack is 4 bytes off. Normally, it should be an issue only for exotic code that picks args manually from the stack.
Title: Re: Switch macro timings
Post by: nidud on April 24, 2016, 11:01:25 PM
deleted
Title: Re: Switch macro timings
Post by: Siekmanski on April 25, 2016, 03:58:00 AM
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Assembled with AsmC
6 ms    case 260, MB Switch_ ret
4 ms    case 260, MB Switch_ jmp
334 ms  case 260, Masm32 switch
4 ms    case 260, AsmC .Switch

6 ms    case 196, MB Switch_ ret
4 ms    case 196, MB Switch_ jmp
252 ms  case 196, Masm32 switch
5 ms    case 196, AsmC .Switch

6 ms    case 132, MB Switch_ ret
4 ms    case 132, MB Switch_ jmp
169 ms  case 132, Masm32 switch
4 ms    case 132, AsmC .Switch

6 ms    case 68, MB Switch_ ret
5 ms    case 68, MB Switch_ jmp
89 ms   case 68, Masm32 switch
4 ms    case 68, AsmC .Switch

6 ms    case 4, MB Switch_ ret
5 ms    case 4, MB Switch_ jmp
4 ms    case 4, Masm32 switch
4 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7
Title: Re: Switch macro timings
Post by: jj2007 on April 25, 2016, 06:04:56 AM
Quote from: nidud on April 24, 2016, 11:01:25 PM

Switch_ eax
  Case_ 1
jmp toend
...
Endsw_
toend:
ret

Not sure what you want to demonstrate here... normally, you should never jmp out of a Switch.

Switch_ eax
  Case_ 1
; no jmp toend
...  ; will arrive after Endsw_ anyway
Endsw_


If you really want to "jump out", use this:
Switch_ eax
  Case_ 1
.if !somecondition
MsgBox "whatever", "title", MB_OK
.endif
Endsw_
Title: Re: Switch macro timings
Post by: nidud on April 25, 2016, 06:58:50 AM
deleted
Title: Re: Switch macro timings
Post by: jj2007 on April 25, 2016, 08:03:17 AM
Quote from: nidud on April 25, 2016, 06:58:50 AMHere's a "jump in" from MasmBasic:

Event Key
  cmp VKey, -VK_F1
  je @Open
  ...
  Case 1, TB0+1 ; 2nd toolbar button or 2nd menu entry
  @Open:

No problem, stack is fine.

QuoteHowever what I actually meant was this:

mov ecx,4
Switch_ ecx, ret
  Case_ 1
mov eax,1
  Case_ 2
mov eax,2
  Case_ 3
mov eax,3
  Case_ 4
mov eax,4
Endsw_


The last case jumps to end.

GOOD CATCH! Corrected with version 25 April :t
Title: Re: Switch macro timings
Post by: Mikl__ on April 25, 2016, 12:01:06 PM
Hi, jj2007!
Intel(R) Pentium(R) CPU G860 @ 3.00GHz
Assembled with AsmC
7 ms case 260, MB Switch_ ret
5 ms case 260, MB Switch_ jmp
410 ms case 260, Masm32 switch
5 ms case 260, AsmC .Switch

7 ms case 196, MB Switch_ ret
5 ms case 196, MB Switch_ jmp
301 ms case 196, Masm32 switch
5 ms case 196, AsmC .Switch

7 ms case 132, MB Switch_ ret
6 ms case 132, MB Switch_ jmp
204 ms case 132, Masm32 switch
5 ms case 132, AsmC .Switch

7 ms case 68, MB Switch_ ret
5 ms case 68, MB Switch_ jmp
109 ms case 68, Masm32 switch
6 ms case 68, AsmC .Switch

8 ms case 4, MB Switch_ ret
6 ms case 4, MB Switch_ jmp
11 ms case 4, Masm32 switch
5 ms case 4, AsmC .Switch

2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc

MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7

MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7
Title: Re: Switch macro timings
Post by: jj2007 on April 25, 2016, 12:31:58 PM
Thanks to all of you :icon14:

The problem is to find an application for jump table switches. My editor's help menu, for example, has over 30 entries, so that would be a candidate; but the number of entries can vary, so this is not an option.

WndProc?
WM_NULL                              equ 0h
WM_CREATE                            equ 1h
WM_DESTROY                           equ 2h
WM_MOVE                              equ 3h
WM_SIZE                              equ 5h
WM_ACTIVATE                          equ 6h


Looks like a great candidate, but...
WM_CUT                               equ 300h
WM_COPY                              equ 301h
WM_PASTE                             equ 302h


No good for a jump table ::)
Title: Re: Switch macro timings
Post by: nidud on April 26, 2016, 12:45:54 AM
deleted
Title: Re: Switch macro timings
Post by: jj2007 on April 29, 2016, 09:37:58 PM
This is a test with 2,000 cases:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Result MB:      0, 26 ms
Result Masm32:  0, 1795 ms
Result AsmC:    0, 29 ms

14084   bytes for swMB
27540   bytes for swMasm32
22111   bytes for swAsmC


But if you increase to 10,000 cases, the timings for Masm32 switch jump up dramatically:
Result MB:      0, 157 ms
Result Masm32:  0, 202547 ms
Result AsmC:    0, 185 ms


I wonder if AMD cpus behave differently...
Title: Re: Switch macro timings
Post by: hutch-- on April 29, 2016, 09:46:32 PM
Each has its advantage, try a table based sequence of 10 1000 1000000 100000000 and you will see why a normal sequential comparison switch is more general purpose. Table based switch statements must work on stored addresses and with wide disparity of value that tables become massive.
Title: Re: Switch macro timings
Post by: TWell on April 29, 2016, 11:38:34 PM
AMD Athlon(tm) II X2 220 Processor 2.8 Ghz
Assembled with AsmC
Result MB:      0, 25 ms
Result Masm32:  0, 2708 ms
Result AsmC:    0, 18 ms

14084   bytes for swMB
27540   bytes for swMasm32
22111   bytes for swAsmC
Title: Re: Switch macro timings
Post by: jj2007 on June 16, 2017, 09:38:44 PM
Nice "deep" article on switch: From Switch Statement Down to Machine Code (http://lazarenko.me/switch/) by Vlad Lazarenko (credits to Frankie at Pelles C (http://forum.pellesc.de/index.php?topic=7160.msg27165#msg27165)).
Title: Re: Switch macro timings
Post by: hutch-- on June 27, 2017, 07:08:06 PM
 :biggrin:

Try a "switch" statement with a number range like "0, 2gig, 4gig" and see which algo is faster.  :P