News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

I'd like it shorter

Started by frktons, December 30, 2012, 07:51:23 AM

Previous topic - Next topic

frktons

I have these instructions to use a jump table, and I think
they are too many.

Any shorter, and faster solution?

.data

    Label2jmpb   dd  Labelb1, Labelb2, Labelb3, Labelb4
    PtrLabelb    dd  Label2jmpb
    OptionX      dd  0

.code

looptodo:
    inc   OptionX
    lea   edx, Label2jmpb
    mov   esi, OptionX
    dec   esi
    shl   esi, 2
    add   edx, esi

    jmp   dword ptr [edx]
.....

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

        mov     esi,OptionX
        inc     esi
        mov     OptionX,esi
        jmp dword ptr Label2jmpb[4*esi-4]


slight difference is that ESI is not the same value when you get to the branch
nor is EDX, because i didn't use it

Tedd


mov eax,OptionX
inc OptionX
jmp DWORD PTR [Label2jmpb + 4*eax]



You're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.
Potato2

jj2007

OptionX equ <ebx>  ; or any other register than won't be trashed

inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]

Variant:
push [Label2jmpb + 4*OptionX]
retn

frktons

Nice examples, thank you all for your help.
I'll see which one is more adaptable to the
case in which I imagine to use it.
:t

Dave, I have a doubt about the line:

        jmp dword ptr Label2jmpb[4*esi-4]

Shouldn't it be:

        jmp dword ptr Label2jmpb[4*esi-8]
?

If I have OptionX = 1 what happen with your code?
I think the label jumped is not the first, but the second.
I'm thinking about OptionX = 1 means First option, first jump.
Maybe you considered OptionX = 0 as the first one?

Well all the examples posted have assumed this, I can see.
My routine tells something different, but it is not a problem,
I can easily adapt all of them to my needs.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Quote from: jj2007 on December 30, 2012, 08:22:39 AM
OptionX equ <ebx>  ; or any other register than won't be trashed

inc OptionX
jmp DWORD PTR [Label2jmpb + 4*OptionX]

Variant:
push [Label2jmpb + 4*OptionX]
retn

Jochen, I tried to implement your solution, but the program hangs,
any idea why? Did I make any error?

    OptionX equ <ebx>

    mov OptionX, [Elem-1]
    jmp DWORD PTR [Label2jmpb + 4*OptionX]


Elem contains a number from 1 to 4, and is a dword.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

let's say that OptionX is currently = 0
    inc   OptionX
    lea   edx, Label2jmpb
    mov   esi, OptionX
    dec   esi                         ;<-------------------
    shl   esi, 2
    add   edx, esi

    jmp   dword ptr [edx]


OptionX is now = 1 and you branch to the first label in the list

----------------------------------

let's say that OptionX is currently = 0
        mov     esi,OptionX
        inc     esi
        mov     OptionX,esi
        jmp dword ptr Label2jmpb[4*esi-4]


OptionX is now = 1 and you branch to the first label in the list

dedndave

i think Jochen's hangs because you can't take "4*[label]"
the assembler probably sees that as "4*offset label"

dedndave

as for Tedd,
QuoteYou're jumping to an arbitrary location from a memory pointer, so there's no point worrying over speed.
a clock cycle is a clock cycle - lol

n clock cycles + x clock cycles = (n + x) clock cycles   :biggrin:
(distributive law of clock cycles ???)

HOWEVER, i am not sure mine is any faster, as there are dependancies at every step
probably a wash   :P
Tedd's code looks like it might be a couple bytes shorter

frktons

OK dave, if you start with OptionX = 0 I understand your code  :t
About Jochen's code I've to see if I added some strange thing, as
you suggest.
Let's see.

"4*offset label"

Well apparently OptionX is ebx, so the code is doing 4*ebx.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Oh I found a solution. Starting from OptionX = -1
and avoiding:

mov OptionX, [Elem-1]


Now it works.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Time for some timings ;)

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
loop overhead is approx. 188/100 cycles

2226    cycles for 100 * JJ simple
2767    cycles for 100 * Frank
2919    cycles for 100 * Dave

2228    cycles for 100 * JJ simple
2913    cycles for 100 * Frank
2923    cycles for 100 * Dave

2225    cycles for 100 * JJ simple
2918    cycles for 100 * Frank
2924    cycles for 100 * Dave

27      bytes for JJ simple
48      bytes for Frank
49      bytes for Dave

frktons

Very good Jochen  :t
Your version is much shorter and also faster, I think
I've already adopted it, for a secret intuition  :lol:

Didn't you test Tedd's code?
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

sinsi

AMD doesn't like this...

AMD Phenom(tm) II X6 1100T Processor (SSE3)
loop overhead is approx. 227/100 cycles

7196    cycles for 100 * JJ simple
8552    cycles for 100 * Frank
8542    cycles for 100 * Dave

7284    cycles for 100 * JJ simple
8464    cycles for 100 * Frank
8461    cycles for 100 * Dave

7199    cycles for 100 * JJ simple
8415    cycles for 100 * Frank
8424    cycles for 100 * Dave

27      bytes for JJ simple
48      bytes for Frank
49      bytes for Dave
Tá fuinneoga a haon déag níos fearr :biggrin:

FORTRANS

Hi,

   A mishmash of results.  jj's is consistently faster.


pre-P4loop overhead is approx. 224/100 cycles

6562 cycles for 100 * JJ simple
9383 cycles for 100 * Frank
7870 cycles for 100 * Dave

6560 cycles for 100 * JJ simple
9389 cycles for 100 * Frank
7873 cycles for 100 * Dave

6562 cycles for 100 * JJ simple
9378 cycles for 100 * Frank
7916 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave


--- ok ---

Mobile Intel(R) Celeron(R) processor     600MHz (SSE2)
loop overhead is approx. 211/100 cycles

2206 cycles for 100 * JJ simple
2712 cycles for 100 * Frank
3119 cycles for 100 * Dave

2229 cycles for 100 * JJ simple
2860 cycles for 100 * Frank
3119 cycles for 100 * Dave

2239 cycles for 100 * JJ simple
2684 cycles for 100 * Frank
3152 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave


--- ok ---

pre-P4 (SSE1)
loop overhead is approx. 210/100 cycles

7279 cycles for 100 * JJ simple
10285 cycles for 100 * Frank
10122 cycles for 100 * Dave

7278 cycles for 100 * JJ simple
10296 cycles for 100 * Frank
10078 cycles for 100 * Dave

7289 cycles for 100 * JJ simple
10279 cycles for 100 * Frank
10176 cycles for 100 * Dave

27 bytes for JJ simple
48 bytes for Frank
49 bytes for Dave


--- ok ---


Cheers,

Steve N.