News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Switch macro timings

Started by jj2007, April 24, 2016, 07:34:02 PM

Previous topic - Next topic

jj2007

Testing Nidud's new .Switch macro and MasmBasic's version (of today, download here). Some timings on other processors would be nice. No inkey - run from a DOS prompt.

Note the tradeoff between size and speed for the MasmBasic "ret" and "jmp" versions; the latter produces code that is almost identical to AsmC's .Switch macro (I still have to find out where the 8 or so extra bytes come from  :( ). What is remarkable is that there is already a speed gain for more than 4 cases, compared to the Masm32 switch macro.

Note also (quote from \Masm32\MasmBasic\MbGuide.rtf but valid also for AsmC .Switch)
- avoid using Switch_ with few cases that are far apart, e.g. case -1000, case 1, case 1000, as this
  creates a huge jump table with many default entries; the Masm32 switch macro will be more size-efficient


Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Assembled with AsmC
11 ms   case 260, MB Switch_ ret
8 ms    case 260, MB Switch_ jmp
370 ms  case 260, Masm32 switch
5 ms    case 260, AsmC .Switch

7 ms    case 196, MB Switch_ ret
5 ms    case 196, MB Switch_ jmp
286 ms  case 196, Masm32 switch
5 ms    case 196, AsmC .Switch

7 ms    case 132, MB Switch_ ret
5 ms    case 132, MB Switch_ jmp
189 ms  case 132, Masm32 switch
5 ms    case 132, AsmC .Switch

7 ms    case 68, MB Switch_ ret
5 ms    case 68, MB Switch_ jmp
98 ms   case 68, Masm32 switch
5 ms    case 68, AsmC .Switch

8 ms    case 4, MB Switch_ ret
5 ms    case 4, MB Switch_ jmp
10 ms   case 4, Masm32 switch
5 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7


Source and exe attached; for building your own version, RichMasm works best, as it creates the AsmUsed$() macro (see above, "Assembled with AsmC"), which allows selective inclusion of Nidud's .Switch based on the assembler used. But the *.asc source opens in Wordpad, too.

P.S.: The macro assembles fine with everything from ML 6.15 .. 10, JWasm, HJWasm and AsmC; the ML versions build about half as fast, though, due to the excessive number of cases tested (300+).

Note also that there is cases=... in line 21; ML chokes above ca. 920 cases, AsmC seems not so impressed, but prepare for fat executables, e.g. for 10,000 cases:
109663  bytes for MBSret
149663  bytes for MBSjmp
179345  bytes for Masm32
149658  bytes for asmc
8)

sinsi

Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Assembled with AsmC
5 ms    case 260, MB Switch_ ret
4 ms    case 260, MB Switch_ jmp
294 ms  case 260, Masm32 switch
4 ms    case 260, AsmC .Switch

5 ms    case 196, MB Switch_ ret
4 ms    case 196, MB Switch_ jmp
226 ms  case 196, Masm32 switch
3 ms    case 196, AsmC .Switch

5 ms    case 132, MB Switch_ ret
4 ms    case 132, MB Switch_ jmp
149 ms  case 132, Masm32 switch
4 ms    case 132, AsmC .Switch

6 ms    case 68, MB Switch_ ret
4 ms    case 68, MB Switch_ jmp
81 ms   case 68, Masm32 switch
3 ms    case 68, AsmC .Switch

6 ms    case 4, MB Switch_ ret
4 ms    case 4, MB Switch_ jmp
4 ms    case 4, Masm32 switch
3 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

sinsi

AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
Assembled with AsmC
4 ms    case 260, MB Switch_ ret
4 ms    case 260, MB Switch_ jmp
313 ms  case 260, Masm32 switch
4 ms    case 260, AsmC .Switch

4 ms    case 196, MB Switch_ ret
3 ms    case 196, MB Switch_ jmp
218 ms  case 196, Masm32 switch
6 ms    case 196, AsmC .Switch

4 ms    case 132, MB Switch_ ret
3 ms    case 132, MB Switch_ jmp
147 ms  case 132, Masm32 switch
7 ms    case 132, AsmC .Switch

5 ms    case 68, MB Switch_ ret
4 ms    case 68, MB Switch_ jmp
77 ms   case 68, Masm32 switch
4 ms    case 68, AsmC .Switch

5 ms    case 4, MB Switch_ ret
6 ms    case 4, MB Switch_ jmp
6 ms    case 4, Masm32 switch
4 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7


nidud

#3
deleted

nidud

#4
deleted

jj2007

Quote from: nidud on April 24, 2016, 08:38:17 PMIt's a slick solution but impossible to implement given the stack is uneven at the case entry

That could indeed be difficult in C, but the Switch_ macro works just fine, and size-wise it's up to 27% or so less, due to the retn instead of the jmp. I'll put a warning in the guide, though, that inside the Case the stack is 4 bytes off. Normally, it should be an issue only for exotic code that picks args manually from the stack.

nidud

#6
deleted

Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Assembled with AsmC
6 ms    case 260, MB Switch_ ret
4 ms    case 260, MB Switch_ jmp
334 ms  case 260, Masm32 switch
4 ms    case 260, AsmC .Switch

6 ms    case 196, MB Switch_ ret
4 ms    case 196, MB Switch_ jmp
252 ms  case 196, Masm32 switch
5 ms    case 196, AsmC .Switch

6 ms    case 132, MB Switch_ ret
4 ms    case 132, MB Switch_ jmp
169 ms  case 132, Masm32 switch
4 ms    case 132, AsmC .Switch

6 ms    case 68, MB Switch_ ret
5 ms    case 68, MB Switch_ jmp
89 ms   case 68, Masm32 switch
4 ms    case 68, AsmC .Switch

6 ms    case 4, MB Switch_ ret
5 ms    case 4, MB Switch_ jmp
4 ms    case 4, Masm32 switch
4 ms    case 4, AsmC .Switch

2999    bytes for MBSret
4211    bytes for MBSjmp
4799    bytes for Masm32
4200    bytes for asmc

MB Switch_ in jmp mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7

MB Switch_ in ret mode:
ct=-2   Case -2
ct=-1   Default
ct=0    Default
ct=1    Case 1 .. 4
ct=2    ecx=2
ct=3    Case 1 .. 4
ct=4    edx=4
ct=5    Default
ct=6    var1=6
ct=7    Case 7
Creative coders use backward thinking techniques as a strategy.

jj2007

Quote from: nidud on April 24, 2016, 11:01:25 PM

Switch_ eax
  Case_ 1
jmp toend
...
Endsw_
toend:
ret

Not sure what you want to demonstrate here... normally, you should never jmp out of a Switch.

Switch_ eax
  Case_ 1
; no jmp toend
...  ; will arrive after Endsw_ anyway
Endsw_


If you really want to "jump out", use this:
Switch_ eax
  Case_ 1
.if !somecondition
MsgBox "whatever", "title", MB_OK
.endif
Endsw_

nidud

#9
deleted

jj2007

Quote from: nidud on April 25, 2016, 06:58:50 AMHere's a "jump in" from MasmBasic:

Event Key
  cmp VKey, -VK_F1
  je @Open
  ...
  Case 1, TB0+1 ; 2nd toolbar button or 2nd menu entry
  @Open:

No problem, stack is fine.

QuoteHowever what I actually meant was this:

mov ecx,4
Switch_ ecx, ret
  Case_ 1
mov eax,1
  Case_ 2
mov eax,2
  Case_ 3
mov eax,3
  Case_ 4
mov eax,4
Endsw_


The last case jumps to end.

GOOD CATCH! Corrected with version 25 April :t

Mikl__

Hi, jj2007!
Intel(R) Pentium(R) CPU G860 @ 3.00GHz
Assembled with AsmC
7 ms case 260, MB Switch_ ret
5 ms case 260, MB Switch_ jmp
410 ms case 260, Masm32 switch
5 ms case 260, AsmC .Switch

7 ms case 196, MB Switch_ ret
5 ms case 196, MB Switch_ jmp
301 ms case 196, Masm32 switch
5 ms case 196, AsmC .Switch

7 ms case 132, MB Switch_ ret
6 ms case 132, MB Switch_ jmp
204 ms case 132, Masm32 switch
5 ms case 132, AsmC .Switch

7 ms case 68, MB Switch_ ret
5 ms case 68, MB Switch_ jmp
109 ms case 68, Masm32 switch
6 ms case 68, AsmC .Switch

8 ms case 4, MB Switch_ ret
6 ms case 4, MB Switch_ jmp
11 ms case 4, Masm32 switch
5 ms case 4, AsmC .Switch

2999 bytes for MBSret
4211 bytes for MBSjmp
4799 bytes for Masm32
4200 bytes for asmc

MB Switch_ in jmp mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7

MB Switch_ in ret mode:
ct=-2 Case -2
ct=-1 Default
ct=0 Default
ct=1 Case 1 .. 4
ct=2 ecx=2
ct=3 Case 1 .. 4
ct=4 edx=4
ct=5 Default
ct=6 var1=6
ct=7 Case 7

jj2007

Thanks to all of you :icon14:

The problem is to find an application for jump table switches. My editor's help menu, for example, has over 30 entries, so that would be a candidate; but the number of entries can vary, so this is not an option.

WndProc?
WM_NULL                              equ 0h
WM_CREATE                            equ 1h
WM_DESTROY                           equ 2h
WM_MOVE                              equ 3h
WM_SIZE                              equ 5h
WM_ACTIVATE                          equ 6h


Looks like a great candidate, but...
WM_CUT                               equ 300h
WM_COPY                              equ 301h
WM_PASTE                             equ 302h


No good for a jump table ::)

nidud

#13
deleted

jj2007

This is a test with 2,000 cases:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Result MB:      0, 26 ms
Result Masm32:  0, 1795 ms
Result AsmC:    0, 29 ms

14084   bytes for swMB
27540   bytes for swMasm32
22111   bytes for swAsmC


But if you increase to 10,000 cases, the timings for Masm32 switch jump up dramatically:
Result MB:      0, 157 ms
Result Masm32:  0, 202547 ms
Result AsmC:    0, 185 ms


I wonder if AMD cpus behave differently...