News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

ArrayIndex timings

Started by jj2007, February 06, 2019, 01:12:52 AM

Previous topic - Next topic

jj2007

A test for the new binary search ArrayIndex(somearray, numbertofind) macro. The first rows use the macro, the second set of rows use a linear search with repne scasd. It seems that scasd is quite competitive for small tables :biggrin:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

Timings for ArrayIndex():
29 ms for 1000000 searches with 0 mismatches, table elements=200
30 ms for 1000000 searches with 0 mismatches, table elements=400
26 ms for 1000000 searches with 0 mismatches, table elements=800
31 ms for 1000000 searches with 0 mismatches, table elements=1600
34 ms for 998400 searches with 0 mismatches, table elements=3200
31 ms for 998400 searches with 0 mismatches, table elements=6400
34 ms for 998400 searches with 0 mismatches, table elements=12800
32 ms for 998400 searches with 0 mismatches, table elements=25600

Timings for repne scasd:
96 ms for 1000000 searches with 0 mismatches, table elements=200
155 ms for 1000000 searches with 0 mismatches, table elements=400
296 ms for 1000000 searches with 0 mismatches, table elements=800
599 ms for 1000000 searches with 0 mismatches, table elements=1600
1112 ms for 998400 searches with 0 mismatches, table elements=3200
2209 ms for 998400 searches with 0 mismatches, table elements=6400
4398 ms for 998400 searches with 0 mismatches, table elements=12800
8798 ms for 998400 searches with 0 mismatches, table elements=25600


Source & exe attached. Building requires MasmBasic 5 February 2019 and a text file with at least 25600 numbers, e.g. obtained from prime numbers generated here.

sinsi


Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Timings for ArrayIndex():
10 ms for 500000 searches with 0 mismatches, table elements=100
10 ms for 500000 searches with 0 mismatches, table elements=200
10 ms for 500000 searches with 0 mismatches, table elements=400
12 ms for 500000 searches with 0 mismatches, table elements=800
12 ms for 499200 searches with 0 mismatches, table elements=1600
11 ms for 499200 searches with 0 mismatches, table elements=3200
11 ms for 499200 searches with 0 mismatches, table elements=6400
11 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
20 ms for 500000 searches with 0 mismatches, table elements=100
32 ms for 500000 searches with 0 mismatches, table elements=200
57 ms for 500000 searches with 0 mismatches, table elements=400
108 ms for 500000 searches with 0 mismatches, table elements=800
208 ms for 499200 searches with 0 mismatches, table elements=1600
409 ms for 499200 searches with 0 mismatches, table elements=3200
811 ms for 499200 searches with 0 mismatches, table elements=6400
1621 ms for 499200 searches with 0 mismatches, table elements=12800


Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz

Timings for ArrayIndex():
16 ms for 500000 searches with 0 mismatches, table elements=100
13 ms for 500000 searches with 0 mismatches, table elements=200
15 ms for 500000 searches with 0 mismatches, table elements=400
15 ms for 500000 searches with 0 mismatches, table elements=800
15 ms for 499200 searches with 0 mismatches, table elements=1600
14 ms for 499200 searches with 0 mismatches, table elements=3200
15 ms for 499200 searches with 0 mismatches, table elements=6400
14 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
23 ms for 500000 searches with 0 mismatches, table elements=100
46 ms for 500000 searches with 0 mismatches, table elements=200
67 ms for 500000 searches with 0 mismatches, table elements=400
150 ms for 500000 searches with 0 mismatches, table elements=800
271 ms for 499200 searches with 0 mismatches, table elements=1600
496 ms for 499200 searches with 0 mismatches, table elements=3200
1015 ms for 499200 searches with 0 mismatches, table elements=6400
2102 ms for 499200 searches with 0 mismatches, table elements=12800
Creative coders use backward thinking techniques as a strategy.

guga

Tks JJ
Amazing Speed :dazzled: :dazzled: :dazzled: :dazzled: :dazzled:


Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz

Timings for ArrayIndex():
23 ms for 500000 searches with 0 mismatches, table elements=100
21 ms for 500000 searches with 0 mismatches, table elements=200
21 ms for 500000 searches with 0 mismatches, table elements=400
20 ms for 500000 searches with 0 mismatches, table elements=800
20 ms for 499200 searches with 0 mismatches, table elements=1600
19 ms for 499200 searches with 0 mismatches, table elements=3200
20 ms for 499200 searches with 0 mismatches, table elements=6400
20 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
68 ms for 500000 searches with 0 mismatches, table elements=400
128 ms for 500000 searches with 0 mismatches, table elements=800
312 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
976 ms for 499200 searches with 0 mismatches, table elements=6400


is this the fixed version ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Thanks to all. The algo is pretty fast indeed, same btw also for REAL8 and QWORD. @Guga: No fix so far for the problem with the last element. For performance reasons, I am trying to avoid an extra check, but it's not trivial, and I need more time :(

TBRANSO1

Quote from: guga on February 06, 2019, 10:44:48 AM
Tks JJ
Amazing Speed :dazzled: :dazzled: :dazzled: :dazzled: :dazzled:


Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz

Timings for ArrayIndex():
23 ms for 500000 searches with 0 mismatches, table elements=100
21 ms for 500000 searches with 0 mismatches, table elements=200
21 ms for 500000 searches with 0 mismatches, table elements=400
20 ms for 500000 searches with 0 mismatches, table elements=800
20 ms for 499200 searches with 0 mismatches, table elements=1600
19 ms for 499200 searches with 0 mismatches, table elements=3200
20 ms for 499200 searches with 0 mismatches, table elements=6400
20 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
68 ms for 500000 searches with 0 mismatches, table elements=400
128 ms for 500000 searches with 0 mismatches, table elements=800
312 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
976 ms for 499200 searches with 0 mismatches, table elements=6400


having played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

jj2007

Quote from: TBRANSO1 on February 06, 2019, 05:00:50 PMhaving played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

Post the testbed, please.

hutch--


Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

Timings for ArrayIndex():
12 ms for 500000 searches with 0 mismatches, table elements=100
12 ms for 500000 searches with 0 mismatches, table elements=200
12 ms for 500000 searches with 0 mismatches, table elements=400
15 ms for 500000 searches with 0 mismatches, table elements=800
14 ms for 499200 searches with 0 mismatches, table elements=1600
14 ms for 499200 searches with 0 mismatches, table elements=3200
14 ms for 499200 searches with 0 mismatches, table elements=6400
13 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
24 ms for 500000 searches with 0 mismatches, table elements=100
39 ms for 500000 searches with 0 mismatches, table elements=200
69 ms for 500000 searches with 0 mismatches, table elements=400
125 ms for 500000 searches with 0 mismatches, table elements=800
241 ms for 499200 searches with 0 mismatches, table elements=1600
491 ms for 499200 searches with 0 mismatches, table elements=3200
957 ms for 499200 searches with 0 mismatches, table elements=6400
1868 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---

aw27

When matrix sizes increase and times may even decrease there is something that does not sound well. Not to talk about the scale of the differences. There are no fat chicken for little money, says a Portuguese proverb.
I did not have a look at what you have done because it is hard for me to dig inside Masm Basic but I will if you don't clarify the miracle.

jj2007

Quote from: AW on February 06, 2019, 11:45:21 PM
When matrix sizes increase and times may even decrease there is something that does not sound well. Not to talk about the scale of the differences. There are no fat chicken for little money, says a Portuguese proverb.

There is no such thing as a free lunch, says the economist. But the timings for the binary search version are pretty stable, not decreasing. Still, I am also skeptical, in fact I had the same suspicion. But the results are correct, except for the high indices (and I am working on that one if I find the time).

guga

AW

the algorithm behaves similar to this technique:

https://academy.realm.io/posts/how-we-beat-cpp-stl-binary-search

I´m testing it here intensively with image processing and it works as expected, except for the last 1 or 2 values on the array (For odd and even sized table). I ported it to the routines i´m using for CieLCH convertion as:

For Dword.

Proc BinSearch:
    Arguments @pTable, @Value, @Size
    Uses esi, ecx, edx

    mov edx D@Value
    mov esi D@pTable
    push esi
    mov ecx D@Size

@StartLoop:
    sar ecx 1
    lea eax D$esi+ecx*4 | je @EndSearch
    cmp edx D$eax  | jl @StartLoop
    mov esi eax | jne @StartLoop

@EndSearch:
    pop ecx
    cmp edx D$eax
    setne dl
    sub eax ecx
    sar eax 2

    .Test_If dl dl ; <---- test dl dl | jz L1> (L1 is the macro at ".Test_End")
        If eax >s D@Size ; < cmp eax Size | jng L2>; (L2 = EndIf macro)
            mov ecx D@Size
            mov eax D$esi+ecx*4

        End_If
    .Test_End

EndP


For Real8 it also works, but i´m using his previous version that don´t have that minor problem of the last elements on the array. (Btw...his previous version also works and is amazingly fast)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

LiaoMi

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz

Timings for ArrayIndex():
11 ms for 500000 searches with 0 mismatches, table elements=100
11 ms for 500000 searches with 0 mismatches, table elements=200
12 ms for 500000 searches with 0 mismatches, table elements=400
18 ms for 500000 searches with 0 mismatches, table elements=800
25 ms for 499200 searches with 0 mismatches, table elements=1600
13 ms for 499200 searches with 0 mismatches, table elements=3200
13 ms for 499200 searches with 0 mismatches, table elements=6400
12 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
64 ms for 500000 searches with 0 mismatches, table elements=400
122 ms for 500000 searches with 0 mismatches, table elements=800
238 ms for 499200 searches with 0 mismatches, table elements=1600
472 ms for 499200 searches with 0 mismatches, table elements=3200
947 ms for 499200 searches with 0 mismatches, table elements=6400
1887 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---

aw27

I have both difficulties with Masm Basic and Radasm.
However, things don't appear to make sense. Take this:

@StartLoop:
    sar ecx 1
    lea eax D$esi+ecx*4 | je @EndSearch
    ...
@EndSearch:

which I translate to this:

@StartLoop:
   sar ecx, 1
   lea eax, [esi+4*ecx]
   je @EndSearch
...
@EndSearch:

You divide the size in ecx by 2 then you multiply by 4? Aren't you pointing outside of the range?
If I am wrong, then forget what I said it may be due to my illiteracy in either the MB or Radasm languages.

guga

Esi is a pointer to a address, not a value

mov esi D@pTable ; in masm, it is something like:

mov esi Table
(...)
lea eax [esi+ecx*4]

when you divide ecx by 2 you points to the half of the table. Since the table (on this example) is formed by a array of Dwords, you get the address of Table plus X. Ex:

Mytable: dd 1, 2 , 3, 4, 5

Esi = Mytable

To retrieve the value 4, we need to multply the position by 4 (size of each dword). Since 4 is in position 3, to retrieve the value 4 we simply do: MyTable+12 (bytes)

Mytable:
value =1 (MyTable+0 bytes)
value =2 (MyTable+4 bytes)
value =3 (MyTable+8 bytes)
value =4 (MyTable+12 bytes)


It always points inside the range, except for those minor errors in the last 1 or 2 values when the table size is odd or even. (JJ is working on it) :t :t

The Positions on the table are counted from 0 to XX

MyTable: dd 1, 3, 5, 8, 9

Positions are:
0, 1, 2, 3, 4


The algorithm will always search for the nearest values (The smaller one)

So, if you are searching for the value 7, it will return position "2" that corresponds to the value 5 that is the nearest.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

aw27

Well, I saw you loading ecx with the size of the table, or not?

What is this?
mov ecx D@Size

This is the reason I believe you are out of the range.