I have these instructions to use a jump table, and I think
they are too many.
Any shorter, and faster solution?
.data
Label2jmpb dd Labelb1, Labelb2, Labelb3, Labelb4
PtrLabelb dd Label2jmpb
OptionX dd 0
.code
looptodo:
inc OptionX
lea edx, Label2jmpb
mov esi, OptionX
dec esi
shl esi, 2
add edx, esi
jmp dword ptr [edx]
.....
mov esi,OptionX
inc esi
mov OptionX,esi
jmp dword ptr Label2jmpb[4*esi-4]
slight difference is that ESI is not the same value when you get to the branch
nor is EDX, because i didn't use it
mov eax,OptionX
inc OptionX
jmp DWORD PTR [Label2jmpb + 4*eax]
You're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.
OptionX equ <ebx> ; or any other register than won't be trashed
inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]
Variant:
push [Label2jmpb + 4*OptionX]
retn
Nice examples, thank you all for your help.
I'll see which one is more adaptable to the
case in which I imagine to use it.
:t
Dave, I have a doubt about the line:
jmp dword ptr Label2jmpb[4*esi-4]
Shouldn't it be:
jmp dword ptr Label2jmpb[4*esi-8]
?
If I have OptionX = 1 what happen with your code?
I think the label jumped is not the first, but the second.
I'm thinking about OptionX = 1 means First option, first jump.
Maybe you considered OptionX = 0 as the first one?
Well all the examples posted have assumed this, I can see.
My routine tells something different, but it is not a problem,
I can easily adapt all of them to my needs.
Quote from: jj2007 on December 30, 2012, 08:22:39 AM
OptionX equ <ebx> ; or any other register than won't be trashed
inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]
Variant:
push [Label2jmpb + 4*OptionX]
retn
Jochen, I tried to implement your solution, but the program hangs,
any idea why? Did I make any error?
OptionX equ <ebx>
mov OptionX, [Elem-1]
jmp DWORD PTR [Label2jmpb + 4*OptionX]
Elem contains a number from 1 to 4, and is a dword.
let's say that OptionX is currently = 0
inc OptionX
lea edx, Label2jmpb
mov esi, OptionX
dec esi ;<-------------------
shl esi, 2
add edx, esi
jmp dword ptr [edx]
OptionX is now = 1 and you branch to the first label in the list
----------------------------------
let's say that OptionX is currently = 0
mov esi,OptionX
inc esi
mov OptionX,esi
jmp dword ptr Label2jmpb[4*esi-4]
OptionX is now = 1 and you branch to the first label in the list
i think Jochen's hangs because you can't take "4*[label]"
the assembler probably sees that as "4*offset label"
as for Tedd,
QuoteYou're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.
a clock cycle is a clock cycle - lol
n clock cycles + x clock cycles = (n + x) clock cycles :biggrin:
(distributive law of clock cycles ???)
HOWEVER, i am not sure mine is any faster, as there are dependancies at every step
probably a wash :P
Tedd's code looks like it might be a couple bytes shorter
OK dave, if you start with OptionX = 0 I understand your code :t
About Jochen's code I've to see if I added some strange thing, as
you suggest.
Let's see.
"4*offset label"
Well apparently OptionX is ebx, so the code is doing 4*ebx.
Oh I found a solution. Starting from OptionX = -1
and avoiding:
mov OptionX, [Elem-1]
Now it works.
Time for some timings ;)
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles
2226 cycles for 100 * JJ simple
2767 cycles for 100 * Frank
2919 cycles for 100 * Dave
2228 cycles for 100 * JJ simple
2913 cycles for 100 * Frank
2923 cycles for 100 * Dave
2225 cycles for 100 * JJ simple
2918 cycles for 100 * Frank
2924 cycles for 100 * Dave
27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave
Very good Jochen :t
Your version is much shorter and also faster, I think
I've already adopted it, for a secret intuition :lol:
Didn't you test Tedd's code?
AMD doesn't like this...
AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 227/100 cycles
7196 cycles for 100 * JJ simple
8552 cycles for 100 * Frank
8542 cycles for 100 * Dave
7284 cycles for 100 * JJ simple
8464 cycles for 100 * Frank
8461 cycles for 100 * Dave
7199 cycles for 100 * JJ simple
8415 cycles for 100 * Frank
8424 cycles for 100 * Dave
27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave
Hi,
A mishmash of results. jj's is consistently faster.
pre-P4loop overhead is approx. 224/100 cycles
6562 cycles for 100 * JJ simple
9383 cycles for 100 * Frank
7870 cycles for 100 * Dave
6560 cycles for 100 * JJ simple
9389 cycles for 100 * Frank
7873 cycles for 100 * Dave
6562 cycles for 100 * JJ simple
9378 cycles for 100 * Frank
7916 cycles for 100 * Dave
27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave
--- ok ---
Mobile Intel(R) Celeron(R) processor 600MHz (SSE2)
loop overhead is approx. 211/100 cycles
2206 cycles for 100 * JJ simple
2712 cycles for 100 * Frank
3119 cycles for 100 * Dave
2229 cycles for 100 * JJ simple
2860 cycles for 100 * Frank
3119 cycles for 100 * Dave
2239 cycles for 100 * JJ simple
2684 cycles for 100 * Frank
3152 cycles for 100 * Dave
27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave
--- ok ---
pre-P4 (SSE1)
loop overhead is approx. 210/100 cycles
7279 cycles for 100 * JJ simple
10285 cycles for 100 * Frank
10122 cycles for 100 * Dave
7278 cycles for 100 * JJ simple
10296 cycles for 100 * Frank
10078 cycles for 100 * Dave
7289 cycles for 100 * JJ simple
10279 cycles for 100 * Frank
10176 cycles for 100 * Dave
27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave
--- ok ---
Cheers,
Steve N.