The MASM Forum

General => The Campus => Topic started by: frktons on December 30, 2012, 07:51:23 AM

Title: I'd like it shorter
Post by: frktons on December 30, 2012, 07:51:23 AM
I have these instructions to use a jump table, and I think
they are too many.

Any shorter, and faster solution?

.data

    Label2jmpb   dd  Labelb1, Labelb2, Labelb3, Labelb4
    PtrLabelb    dd  Label2jmpb
    OptionX      dd  0

.code

looptodo:
    inc   OptionX
    lea   edx, Label2jmpb
    mov   esi, OptionX
    dec   esi
    shl   esi, 2
    add   edx, esi

    jmp   dword ptr [edx]
.....

Title: Re: I'd like it shorter
Post by: dedndave on December 30, 2012, 08:01:12 AM
        mov     esi,OptionX
        inc     esi
        mov     OptionX,esi
        jmp dword ptr Label2jmpb[4*esi-4]


slight difference is that ESI is not the same value when you get to the branch
nor is EDX, because i didn't use it
Title: Re: I'd like it shorter
Post by: Tedd on December 30, 2012, 08:15:14 AM

mov eax,OptionX
inc OptionX
jmp DWORD PTR [Label2jmpb + 4*eax]



You're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.
Title: Re: I'd like it shorter
Post by: jj2007 on December 30, 2012, 08:22:39 AM
OptionX equ <ebx>  ; or any other register than won't be trashed

inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]

Variant:
push [Label2jmpb + 4*OptionX]
retn
Title: Re: I'd like it shorter
Post by: frktons on December 30, 2012, 09:03:37 AM
Nice examples, thank you all for your help.
I'll see which one is more adaptable to the
case in which I imagine to use it.
:t

Dave, I have a doubt about the line:

        jmp dword ptr Label2jmpb[4*esi-4]

Shouldn't it be:

        jmp dword ptr Label2jmpb[4*esi-8]
?

If I have OptionX = 1 what happen with your code?
I think the label jumped is not the first, but the second.
I'm thinking about OptionX = 1 means First option, first jump.
Maybe you considered OptionX = 0 as the first one?

Well all the examples posted have assumed this, I can see.
My routine tells something different, but it is not a problem,
I can easily adapt all of them to my needs.
Title: Re: I'd like it shorter
Post by: frktons on December 30, 2012, 09:41:41 AM
Quote from: jj2007 on December 30, 2012, 08:22:39 AM
OptionX equ <ebx>  ; or any other register than won't be trashed

inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]

Variant:
push [Label2jmpb + 4*OptionX]
retn

Jochen, I tried to implement your solution, but the program hangs,
any idea why? Did I make any error?

    OptionX equ <ebx>

    mov OptionX, [Elem-1]
    jmp DWORD PTR [Label2jmpb + 4*OptionX]


Elem contains a number from 1 to 4, and is a dword.
Title: Re: I'd like it shorter
Post by: dedndave on December 30, 2012, 09:42:51 AM
let's say that OptionX is currently = 0
    inc   OptionX
    lea   edx, Label2jmpb
    mov   esi, OptionX
    dec   esi                         ;<-------------------
    shl   esi, 2
    add   edx, esi

    jmp   dword ptr [edx]


OptionX is now = 1 and you branch to the first label in the list

----------------------------------

let's say that OptionX is currently = 0
        mov     esi,OptionX
        inc     esi
        mov     OptionX,esi
        jmp dword ptr Label2jmpb[4*esi-4]


OptionX is now = 1 and you branch to the first label in the list
Title: Re: I'd like it shorter
Post by: dedndave on December 30, 2012, 09:45:57 AM
i think Jochen's hangs because you can't take "4*[label]"
the assembler probably sees that as "4*offset label"
Title: Re: I'd like it shorter
Post by: dedndave on December 30, 2012, 09:51:33 AM
as for Tedd,
QuoteYou're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.
a clock cycle is a clock cycle - lol

n clock cycles + x clock cycles = (n + x) clock cycles   :biggrin:
(distributive law of clock cycles ???)

HOWEVER, i am not sure mine is any faster, as there are dependancies at every step
probably a wash   :P
Tedd's code looks like it might be a couple bytes shorter
Title: Re: I'd like it shorter
Post by: frktons on December 30, 2012, 09:54:24 AM
OK dave, if you start with OptionX = 0 I understand your code  :t
About Jochen's code I've to see if I added some strange thing, as
you suggest.
Let's see.

"4*offset label"

Well apparently OptionX is ebx, so the code is doing 4*ebx.
Title: Re: I'd like it shorter
Post by: frktons on December 30, 2012, 10:05:17 AM
Oh I found a solution. Starting from OptionX = -1
and avoiding:

mov OptionX, [Elem-1]


Now it works.
Title: Re: I'd like it shorter
Post by: jj2007 on December 30, 2012, 11:16:25 AM
Time for some timings ;)

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

2226    cycles for 100 * JJ simple
2767    cycles for 100 * Frank
2919    cycles for 100 * Dave

2228    cycles for 100 * JJ simple
2913    cycles for 100 * Frank
2923    cycles for 100 * Dave

2225    cycles for 100 * JJ simple
2918    cycles for 100 * Frank
2924    cycles for 100 * Dave

27      bytes for JJ simple
48      bytes for Frank
49      bytes for Dave
Title: Re: I'd like it shorter
Post by: frktons on December 30, 2012, 12:03:16 PM
Very good Jochen  :t
Your version is much shorter and also faster, I think
I've already adopted it, for a secret intuition  :lol:

Didn't you test Tedd's code?
Title: Re: I'd like it shorter
Post by: sinsi on December 30, 2012, 02:01:34 PM
AMD doesn't like this...

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 227/100 cycles

7196    cycles for 100 * JJ simple
8552    cycles for 100 * Frank
8542    cycles for 100 * Dave

7284    cycles for 100 * JJ simple
8464    cycles for 100 * Frank
8461    cycles for 100 * Dave

7199    cycles for 100 * JJ simple
8415    cycles for 100 * Frank
8424    cycles for 100 * Dave

27      bytes for JJ simple
48      bytes for Frank
49      bytes for Dave
Title: Re: I'd like it shorter
Post by: FORTRANS on December 31, 2012, 05:13:05 AM
Hi,

   A mishmash of results.  jj's is consistently faster.


pre-P4loop overhead is approx. 224/100 cycles

6562 cycles for 100 * JJ simple
9383 cycles for 100 * Frank
7870 cycles for 100 * Dave

6560 cycles for 100 * JJ simple
9389 cycles for 100 * Frank
7873 cycles for 100 * Dave

6562 cycles for 100 * JJ simple
9378 cycles for 100 * Frank
7916 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
loop overhead is approx. 211/100 cycles

2206 cycles for 100 * JJ simple
2712 cycles for 100 * Frank
3119 cycles for 100 * Dave

2229 cycles for 100 * JJ simple
2860 cycles for 100 * Frank
3119 cycles for 100 * Dave

2239 cycles for 100 * JJ simple
2684 cycles for 100 * Frank
3152 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave


--- ok ---

pre-P4 (SSE1)
loop overhead is approx. 210/100 cycles

7279 cycles for 100 * JJ simple
10285 cycles for 100 * Frank
10122 cycles for 100 * Dave

7278 cycles for 100 * JJ simple
10296 cycles for 100 * Frank
10078 cycles for 100 * Dave

7289 cycles for 100 * JJ simple
10279 cycles for 100 * Frank
10176 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave


--- ok ---


Cheers,

Steve N.