I'd like it shorter

frktons · December 30, 2012, 07:51:23 AM

I have these instructions to use a jump table, and I think
they are too many.

Any shorter, and faster solution?


.data

    Label2jmpb   dd  Labelb1, Labelb2, Labelb3, Labelb4
    PtrLabelb    dd  Label2jmpb 
    OptionX      dd  0

.code

looptodo:
    inc   OptionX
    lea   edx, Label2jmpb
    mov   esi, OptionX
    dec   esi
    shl   esi, 2
    add   edx, esi

    jmp   dword ptr [edx]
.....

dedndave · December 30, 2012, 08:01:12 AM

Code Select

        mov     esi,OptionX
        inc     esi
        mov     OptionX,esi
        jmp dword ptr Label2jmpb[4*esi-4]

slight difference is that ESI is not the same value when you get to the branch
nor is EDX, because i didn't use it

Tedd · December 30, 2012, 08:15:14 AM

Code Select


mov eax,OptionX
inc OptionX
jmp DWORD PTR [Label2jmpb + 4*eax]

You're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.

jj2007 · December 30, 2012, 08:22:39 AM

OptionX equ <ebx> ; or any other register than won't be trashed

inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]

Variant:
push [Label2jmpb + 4*OptionX]
retn

frktons · December 30, 2012, 09:03:37 AM

Nice examples, thank you all for your help.
I'll see which one is more adaptable to the
case in which I imagine to use it.
:t

Dave, I have a doubt about the line:

Code Select


        jmp dword ptr Label2jmpb[4*esi-4]

Shouldn't it be:

Code Select


        jmp dword ptr Label2jmpb[4*esi-8]

?

If I have OptionX = 1 what happen with your code?
I think the label jumped is not the first, but the second.
I'm thinking about OptionX = 1 means First option, first jump.
Maybe you considered OptionX = 0 as the first one?

Well all the examples posted have assumed this, I can see.
My routine tells something different, but it is not a problem,
I can easily adapt all of them to my needs.

frktons · December 30, 2012, 09:41:41 AM

Quote from: jj2007 on December 30, 2012, 08:22:39 AM
OptionX equ <ebx> ; or any other register than won't be trashed

inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]

Variant:
push [Label2jmpb + 4*OptionX]
retn

Jochen, I tried to implement your solution, but the program hangs,
any idea why? Did I make any error?

Code Select


    OptionX equ <ebx>

    mov OptionX, [Elem-1]
    jmp DWORD PTR [Label2jmpb + 4*OptionX]

Elem contains a number from 1 to 4, and is a dword.

dedndave · December 30, 2012, 09:42:51 AM

let's say that OptionX is currently = 0

Code Select

    inc   OptionX
    lea   edx, Label2jmpb
    mov   esi, OptionX
    dec   esi                         ;<-------------------
    shl   esi, 2
    add   edx, esi

    jmp   dword ptr [edx]

OptionX is now = 1 and you branch to the first label in the list

----------------------------------

let's say that OptionX is currently = 0

Code Select

        mov     esi,OptionX
        inc     esi
        mov     OptionX,esi
        jmp dword ptr Label2jmpb[4*esi-4]

OptionX is now = 1 and you branch to the first label in the list

dedndave · December 30, 2012, 09:45:57 AM

i think Jochen's hangs because you can't take "4*[label]"
the assembler probably sees that as "4*offset label"

dedndave · December 30, 2012, 09:51:33 AM

as for Tedd,

QuoteYou're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.

a clock cycle is a clock cycle - lol

n clock cycles + x clock cycles = (n + x) clock cycles

(distributive law of clock cycles ???)

HOWEVER, i am not sure mine is any faster, as there are dependancies at every step
probably a wash :P
Tedd's code looks like it might be a couple bytes shorter

frktons · December 30, 2012, 09:54:24 AM

OK dave, if you start with OptionX = 0 I understand your code :t
About Jochen's code I've to see if I added some strange thing, as
you suggest.
Let's see.

Code Select


"4*offset label"

Well apparently OptionX is ebx, so the code is doing 4*ebx.

frktons · December 30, 2012, 10:05:17 AM

Oh I found a solution. Starting from OptionX = -1
and avoiding:

Code Select


mov OptionX, [Elem-1]

Now it works.

jj2007 · December 30, 2012, 11:16:25 AM

Time for some timings ;)

Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

2226 cycles for 100 * JJ simple
2767 cycles for 100 * Frank
2919 cycles for 100 * Dave

2228 cycles for 100 * JJ simple
2913 cycles for 100 * Frank
2923 cycles for 100 * Dave

2225 cycles for 100 * JJ simple
2918 cycles for 100 * Frank
2924 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave

frktons · December 30, 2012, 12:03:16 PM

Very good Jochen :t
Your version is much shorter and also faster, I think
I've already adopted it, for a secret intuition :lol:

Didn't you test Tedd's code?

sinsi · December 30, 2012, 02:01:34 PM

AMD doesn't like this...

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 227/100 cycles

7196 cycles for 100 * JJ simple
8552 cycles for 100 * Frank
8542 cycles for 100 * Dave

7284 cycles for 100 * JJ simple
8464 cycles for 100 * Frank
8461 cycles for 100 * Dave

7199 cycles for 100 * JJ simple
8415 cycles for 100 * Frank
8424 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave

FORTRANS · December 31, 2012, 05:13:05 AM

Hi,

A mishmash of results. jj's is consistently faster.

Code Select


pre-P4loop overhead is approx. 224/100 cycles

6562	cycles for 100 * JJ simple
9383	cycles for 100 * Frank
7870	cycles for 100 * Dave

6560	cycles for 100 * JJ simple
9389	cycles for 100 * Frank
7873	cycles for 100 * Dave

6562	cycles for 100 * JJ simple
9378	cycles for 100 * Frank
7916	cycles for 100 * Dave

27	bytes for JJ simple
48	bytes for Frank
49	bytes for Dave


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
loop overhead is approx. 211/100 cycles

2206	cycles for 100 * JJ simple
2712	cycles for 100 * Frank
3119	cycles for 100 * Dave

2229	cycles for 100 * JJ simple
2860	cycles for 100 * Frank
3119	cycles for 100 * Dave

2239	cycles for 100 * JJ simple
2684	cycles for 100 * Frank
3152	cycles for 100 * Dave

27	bytes for JJ simple
48	bytes for Frank
49	bytes for Dave


--- ok ---

pre-P4 (SSE1)
loop overhead is approx. 210/100 cycles

7279	cycles for 100 * JJ simple
10285	cycles for 100 * Frank
10122	cycles for 100 * Dave

7278	cycles for 100 * JJ simple
10296	cycles for 100 * Frank
10078	cycles for 100 * Dave

7289	cycles for 100 * JJ simple
10279	cycles for 100 * Frank
10176	cycles for 100 * Dave

27	bytes for JJ simple
48	bytes for Frank
49	bytes for Dave


--- ok ---

Cheers,

Steve N.

The MASM Forum

News:

I'd like it shorter

frktons

dedndave

Tedd

jj2007

frktons

frktons

dedndave

dedndave

dedndave

frktons

frktons

jj2007

frktons

sinsi

FORTRANS