ArrayIndex timings

jj2007 · February 06, 2019, 01:12:52 AM

A test for the new binary search ArrayIndex(somearray, numbertofind) macro. The first rows use the macro, the second set of rows use a linear search with repne scasd. It seems that scasd is quite competitive for small tables

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

Timings for ArrayIndex():
29 ms for 1000000 searches with 0 mismatches, table elements=200
30 ms for 1000000 searches with 0 mismatches, table elements=400
26 ms for 1000000 searches with 0 mismatches, table elements=800
31 ms for 1000000 searches with 0 mismatches, table elements=1600
34 ms for 998400 searches with 0 mismatches, table elements=3200
31 ms for 998400 searches with 0 mismatches, table elements=6400
34 ms for 998400 searches with 0 mismatches, table elements=12800
32 ms for 998400 searches with 0 mismatches, table elements=25600

Timings for repne scasd:
96 ms for 1000000 searches with 0 mismatches, table elements=200
155 ms for 1000000 searches with 0 mismatches, table elements=400
296 ms for 1000000 searches with 0 mismatches, table elements=800
599 ms for 1000000 searches with 0 mismatches, table elements=1600
1112 ms for 998400 searches with 0 mismatches, table elements=3200
2209 ms for 998400 searches with 0 mismatches, table elements=6400
4398 ms for 998400 searches with 0 mismatches, table elements=12800
8798 ms for 998400 searches with 0 mismatches, table elements=25600

Source & exe attached. Building requires MasmBasic 5 February 2019 and a text file with at least 25600 numbers, e.g. obtained from prime numbers generated here.

sinsi · February 06, 2019, 08:39:27 AM

Code Select


Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Timings for ArrayIndex():
10 ms for 500000 searches with 0 mismatches, table elements=100
10 ms for 500000 searches with 0 mismatches, table elements=200
10 ms for 500000 searches with 0 mismatches, table elements=400
12 ms for 500000 searches with 0 mismatches, table elements=800
12 ms for 499200 searches with 0 mismatches, table elements=1600
11 ms for 499200 searches with 0 mismatches, table elements=3200
11 ms for 499200 searches with 0 mismatches, table elements=6400
11 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
20 ms for 500000 searches with 0 mismatches, table elements=100
32 ms for 500000 searches with 0 mismatches, table elements=200
57 ms for 500000 searches with 0 mismatches, table elements=400
108 ms for 500000 searches with 0 mismatches, table elements=800
208 ms for 499200 searches with 0 mismatches, table elements=1600
409 ms for 499200 searches with 0 mismatches, table elements=3200
811 ms for 499200 searches with 0 mismatches, table elements=6400
1621 ms for 499200 searches with 0 mismatches, table elements=12800

Siekmanski · February 06, 2019, 08:45:07 AM

Code Select

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz

Timings for ArrayIndex():
16 ms for 500000 searches with 0 mismatches, table elements=100
13 ms for 500000 searches with 0 mismatches, table elements=200
15 ms for 500000 searches with 0 mismatches, table elements=400
15 ms for 500000 searches with 0 mismatches, table elements=800
15 ms for 499200 searches with 0 mismatches, table elements=1600
14 ms for 499200 searches with 0 mismatches, table elements=3200
15 ms for 499200 searches with 0 mismatches, table elements=6400
14 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
23 ms for 500000 searches with 0 mismatches, table elements=100
46 ms for 500000 searches with 0 mismatches, table elements=200
67 ms for 500000 searches with 0 mismatches, table elements=400
150 ms for 500000 searches with 0 mismatches, table elements=800
271 ms for 499200 searches with 0 mismatches, table elements=1600
496 ms for 499200 searches with 0 mismatches, table elements=3200
1015 ms for 499200 searches with 0 mismatches, table elements=6400
2102 ms for 499200 searches with 0 mismatches, table elements=12800

guga · February 06, 2019, 10:44:48 AM

Tks JJ
Amazing Speed

Code Select


Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz

Timings for ArrayIndex():
23 ms for 500000 searches with 0 mismatches, table elements=100
21 ms for 500000 searches with 0 mismatches, table elements=200
21 ms for 500000 searches with 0 mismatches, table elements=400
20 ms for 500000 searches with 0 mismatches, table elements=800
20 ms for 499200 searches with 0 mismatches, table elements=1600
19 ms for 499200 searches with 0 mismatches, table elements=3200
20 ms for 499200 searches with 0 mismatches, table elements=6400
20 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
68 ms for 500000 searches with 0 mismatches, table elements=400
128 ms for 500000 searches with 0 mismatches, table elements=800
312 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
976 ms for 499200 searches with 0 mismatches, table elements=6400

is this the fixed version ?

jj2007 · February 06, 2019, 01:37:10 PM

Thanks to all. The algo is pretty fast indeed, same btw also for REAL8 and QWORD. @Guga: No fix so far for the problem with the last element. For performance reasons, I am trying to avoid an extra check, but it's not trivial, and I need more time :(

TBRANSO1 · February 06, 2019, 05:00:50 PM

Quote from: guga on February 06, 2019, 10:44:48 AM
Tks JJ
Amazing Speed

Code Select Expand
Intel(R) Core(TM) i7 CPU 870 @ 2.93GHz Timings for ArrayIndex(): 23 ms for 500000 searches with 0 mismatches, table elements=100 21 ms for 500000 searches with 0 mismatches, table elements=200 21 ms for 500000 searches with 0 mismatches, table elements=400 20 ms for 500000 searches with 0 mismatches, table elements=800 20 ms for 499200 searches with 0 mismatches, table elements=1600 19 ms for 499200 searches with 0 mismatches, table elements=3200 20 ms for 499200 searches with 0 mismatches, table elements=6400 20 ms for 499200 searches with 0 mismatches, table elements=12800 Timings for repne scasd: 22 ms for 500000 searches with 0 mismatches, table elements=100 36 ms for 500000 searches with 0 mismatches, table elements=200 68 ms for 500000 searches with 0 mismatches, table elements=400 128 ms for 500000 searches with 0 mismatches, table elements=800 312 ms for 499200 searches with 0 mismatches, table elements=1600 480 ms for 499200 searches with 0 mismatches, table elements=3200 976 ms for 499200 searches with 0 mismatches, table elements=6400

having played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

jj2007 · February 06, 2019, 07:04:58 PM

Quote from: TBRANSO1 on February 06, 2019, 05:00:50 PMhaving played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

Post the testbed, please.

hutch-- · February 06, 2019, 07:33:23 PM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

Timings for ArrayIndex():
12 ms for 500000 searches with 0 mismatches, table elements=100
12 ms for 500000 searches with 0 mismatches, table elements=200
12 ms for 500000 searches with 0 mismatches, table elements=400
15 ms for 500000 searches with 0 mismatches, table elements=800
14 ms for 499200 searches with 0 mismatches, table elements=1600
14 ms for 499200 searches with 0 mismatches, table elements=3200
14 ms for 499200 searches with 0 mismatches, table elements=6400
13 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
24 ms for 500000 searches with 0 mismatches, table elements=100
39 ms for 500000 searches with 0 mismatches, table elements=200
69 ms for 500000 searches with 0 mismatches, table elements=400
125 ms for 500000 searches with 0 mismatches, table elements=800
241 ms for 499200 searches with 0 mismatches, table elements=1600
491 ms for 499200 searches with 0 mismatches, table elements=3200
957 ms for 499200 searches with 0 mismatches, table elements=6400
1868 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---

aw27 · February 06, 2019, 11:45:21 PM

When matrix sizes increase and times may even decrease there is something that does not sound well. Not to talk about the scale of the differences. There are no fat chicken for little money, says a Portuguese proverb.
I did not have a look at what you have done because it is hard for me to dig inside Masm Basic but I will if you don't clarify the miracle.

jj2007 · February 07, 2019, 12:21:18 AM

Quote from: AW on February 06, 2019, 11:45:21 PM
When matrix sizes increase and times may even decrease there is something that does not sound well. Not to talk about the scale of the differences. There are no fat chicken for little money, says a Portuguese proverb.

There is no such thing as a free lunch, says the economist. But the timings for the binary search version are pretty stable, not decreasing. Still, I am also skeptical, in fact I had the same suspicion. But the results are correct, except for the high indices (and I am working on that one if I find the time).

guga · February 07, 2019, 12:31:15 AM

AW

the algorithm behaves similar to this technique:

https://academy.realm.io/posts/how-we-beat-cpp-stl-binary-search

I´m testing it here intensively with image processing and it works as expected, except for the last 1 or 2 values on the array (For odd and even sized table). I ported it to the routines i´m using for CieLCH convertion as:

For Dword.

Code Select


Proc BinSearch:
    Arguments @pTable, @Value, @Size
    Uses esi, ecx, edx

    mov edx D@Value
    mov esi D@pTable
    push esi
    mov ecx D@Size

@StartLoop:
    sar ecx 1
    lea eax D$esi+ecx*4 | je @EndSearch
    cmp edx D$eax  | jl @StartLoop
    mov esi eax | jne @StartLoop

@EndSearch:
    pop ecx
    cmp edx D$eax
    setne dl
    sub eax ecx
    sar eax 2

    .Test_If dl dl ; <---- test dl dl | jz L1> (L1 is the macro at ".Test_End")
        If eax >s D@Size ; < cmp eax Size | jng L2>; (L2 = EndIf macro)
            mov ecx D@Size
            mov eax D$esi+ecx*4

        End_If
    .Test_End

EndP

For Real8 it also works, but i´m using his previous version that don´t have that minor problem of the last elements on the array. (Btw...his previous version also works and is amazingly fast)

LiaoMi · February 07, 2019, 01:37:24 AM

Code Select

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz

Timings for ArrayIndex():
11 ms for 500000 searches with 0 mismatches, table elements=100
11 ms for 500000 searches with 0 mismatches, table elements=200
12 ms for 500000 searches with 0 mismatches, table elements=400
18 ms for 500000 searches with 0 mismatches, table elements=800
25 ms for 499200 searches with 0 mismatches, table elements=1600
13 ms for 499200 searches with 0 mismatches, table elements=3200
13 ms for 499200 searches with 0 mismatches, table elements=6400
12 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
64 ms for 500000 searches with 0 mismatches, table elements=400
122 ms for 500000 searches with 0 mismatches, table elements=800
238 ms for 499200 searches with 0 mismatches, table elements=1600
472 ms for 499200 searches with 0 mismatches, table elements=3200
947 ms for 499200 searches with 0 mismatches, table elements=6400
1887 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---

aw27 · February 07, 2019, 01:39:14 AM

I have both difficulties with Masm Basic and Radasm.
However, things don't appear to make sense. Take this:

@StartLoop:
sar ecx 1
lea eax D$esi+ecx*4 | je @EndSearch
...
@EndSearch:

which I translate to this:

@StartLoop:
   sar ecx, 1
   lea eax, [esi+4*ecx]
   je @EndSearch
...
@EndSearch:

You divide the size in ecx by 2 then you multiply by 4? Aren't you pointing outside of the range?
If I am wrong, then forget what I said it may be due to my illiteracy in either the MB or Radasm languages.

guga · February 07, 2019, 02:01:05 AM

Esi is a pointer to a address, not a value

mov esi D@pTable ; in masm, it is something like:

mov esi Table
(...)
lea eax [esi+ecx*4]

when you divide ecx by 2 you points to the half of the table. Since the table (on this example) is formed by a array of Dwords, you get the address of Table plus X. Ex:

Mytable: dd 1, 2 , 3, 4, 5

Esi = Mytable

To retrieve the value 4, we need to multply the position by 4 (size of each dword). Since 4 is in position 3, to retrieve the value 4 we simply do: MyTable+12 (bytes)

Mytable:
value =1 (MyTable+0 bytes)
value =2 (MyTable+4 bytes)
value =3 (MyTable+8 bytes)
value =4 (MyTable+12 bytes)

It always points inside the range, except for those minor errors in the last 1 or 2 values when the table size is odd or even. (JJ is working on it) :t :t

The Positions on the table are counted from 0 to XX

MyTable: dd 1, 3, 5, 8, 9

Positions are:
0, 1, 2, 3, 4

The algorithm will always search for the nearest values (The smaller one)

So, if you are searching for the value 7, it will return position "2" that corresponds to the value 5 that is the nearest.

aw27 · February 07, 2019, 02:06:03 AM

Well, I saw you loading ecx with the size of the table, or not?

What is this?
mov ecx D@Size

This is the reason I believe you are out of the range.

The MASM Forum

News:

ArrayIndex timings

jj2007

sinsi

Siekmanski

guga

jj2007

TBRANSO1

jj2007

hutch--

aw27

jj2007

guga

LiaoMi

aw27

guga

aw27