The MASM Forum

General => The Laboratory => Topic started by: jj2007 on February 06, 2019, 01:12:52 AM

Title: ArrayIndex timings
Post by: jj2007 on February 06, 2019, 01:12:52 AM
A test for the new binary search ArrayIndex(somearray, numbertofind) macro. The first rows use the macro, the second set of rows use a linear search with repne scasd. It seems that scasd is quite competitive for small tables :biggrin:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

Timings for ArrayIndex():
29 ms for 1000000 searches with 0 mismatches, table elements=200
30 ms for 1000000 searches with 0 mismatches, table elements=400
26 ms for 1000000 searches with 0 mismatches, table elements=800
31 ms for 1000000 searches with 0 mismatches, table elements=1600
34 ms for 998400 searches with 0 mismatches, table elements=3200
31 ms for 998400 searches with 0 mismatches, table elements=6400
34 ms for 998400 searches with 0 mismatches, table elements=12800
32 ms for 998400 searches with 0 mismatches, table elements=25600

Timings for repne scasd:
96 ms for 1000000 searches with 0 mismatches, table elements=200
155 ms for 1000000 searches with 0 mismatches, table elements=400
296 ms for 1000000 searches with 0 mismatches, table elements=800
599 ms for 1000000 searches with 0 mismatches, table elements=1600
1112 ms for 998400 searches with 0 mismatches, table elements=3200
2209 ms for 998400 searches with 0 mismatches, table elements=6400
4398 ms for 998400 searches with 0 mismatches, table elements=12800
8798 ms for 998400 searches with 0 mismatches, table elements=25600

Source & exe attached. Building requires MasmBasic 5 February 2019 (http://masm32.com/board/index.php?topic=94.0) and a text file with at least 25600 numbers, e.g. obtained from prime numbers generated here (https://www.browserling.com/tools/prime-numbers).
Title: Re: ArrayIndex timings
Post by: sinsi on February 06, 2019, 08:39:27 AM
Code: [Select]
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Timings for ArrayIndex():
10 ms for 500000 searches with 0 mismatches, table elements=100
10 ms for 500000 searches with 0 mismatches, table elements=200
10 ms for 500000 searches with 0 mismatches, table elements=400
12 ms for 500000 searches with 0 mismatches, table elements=800
12 ms for 499200 searches with 0 mismatches, table elements=1600
11 ms for 499200 searches with 0 mismatches, table elements=3200
11 ms for 499200 searches with 0 mismatches, table elements=6400
11 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
20 ms for 500000 searches with 0 mismatches, table elements=100
32 ms for 500000 searches with 0 mismatches, table elements=200
57 ms for 500000 searches with 0 mismatches, table elements=400
108 ms for 500000 searches with 0 mismatches, table elements=800
208 ms for 499200 searches with 0 mismatches, table elements=1600
409 ms for 499200 searches with 0 mismatches, table elements=3200
811 ms for 499200 searches with 0 mismatches, table elements=6400
1621 ms for 499200 searches with 0 mismatches, table elements=12800
Title: Re: ArrayIndex timings
Post by: Siekmanski on February 06, 2019, 08:45:07 AM
Code: [Select]
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz

Timings for ArrayIndex():
16 ms for 500000 searches with 0 mismatches, table elements=100
13 ms for 500000 searches with 0 mismatches, table elements=200
15 ms for 500000 searches with 0 mismatches, table elements=400
15 ms for 500000 searches with 0 mismatches, table elements=800
15 ms for 499200 searches with 0 mismatches, table elements=1600
14 ms for 499200 searches with 0 mismatches, table elements=3200
15 ms for 499200 searches with 0 mismatches, table elements=6400
14 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
23 ms for 500000 searches with 0 mismatches, table elements=100
46 ms for 500000 searches with 0 mismatches, table elements=200
67 ms for 500000 searches with 0 mismatches, table elements=400
150 ms for 500000 searches with 0 mismatches, table elements=800
271 ms for 499200 searches with 0 mismatches, table elements=1600
496 ms for 499200 searches with 0 mismatches, table elements=3200
1015 ms for 499200 searches with 0 mismatches, table elements=6400
2102 ms for 499200 searches with 0 mismatches, table elements=12800
Title: Re: ArrayIndex timings
Post by: guga on February 06, 2019, 10:44:48 AM
Tks JJ
Amazing Speed :dazzled: :dazzled: :dazzled: :dazzled: :dazzled:

Code: [Select]
Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz

Timings for ArrayIndex():
23 ms for 500000 searches with 0 mismatches, table elements=100
21 ms for 500000 searches with 0 mismatches, table elements=200
21 ms for 500000 searches with 0 mismatches, table elements=400
20 ms for 500000 searches with 0 mismatches, table elements=800
20 ms for 499200 searches with 0 mismatches, table elements=1600
19 ms for 499200 searches with 0 mismatches, table elements=3200
20 ms for 499200 searches with 0 mismatches, table elements=6400
20 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
68 ms for 500000 searches with 0 mismatches, table elements=400
128 ms for 500000 searches with 0 mismatches, table elements=800
312 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
976 ms for 499200 searches with 0 mismatches, table elements=6400

is this the fixed version ?
Title: Re: ArrayIndex timings
Post by: jj2007 on February 06, 2019, 01:37:10 PM
Thanks to all. The algo is pretty fast indeed, same btw also for REAL8 and QWORD. @Guga: No fix so far for the problem with the last element. For performance reasons, I am trying to avoid an extra check, but it's not trivial, and I need more time :(
Title: Re: ArrayIndex timings
Post by: TBRANSO1 on February 06, 2019, 05:00:50 PM
Tks JJ
Amazing Speed :dazzled: :dazzled: :dazzled: :dazzled: :dazzled:

Code: [Select]
Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz

Timings for ArrayIndex():
23 ms for 500000 searches with 0 mismatches, table elements=100
21 ms for 500000 searches with 0 mismatches, table elements=200
21 ms for 500000 searches with 0 mismatches, table elements=400
20 ms for 500000 searches with 0 mismatches, table elements=800
20 ms for 499200 searches with 0 mismatches, table elements=1600
19 ms for 499200 searches with 0 mismatches, table elements=3200
20 ms for 499200 searches with 0 mismatches, table elements=6400
20 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
68 ms for 500000 searches with 0 mismatches, table elements=400
128 ms for 500000 searches with 0 mismatches, table elements=800
312 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
976 ms for 499200 searches with 0 mismatches, table elements=6400

having played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!
Title: Re: ArrayIndex timings
Post by: jj2007 on February 06, 2019, 07:04:58 PM
having played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

Post the testbed, please.
Title: Re: ArrayIndex timings
Post by: hutch-- on February 06, 2019, 07:33:23 PM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz

Timings for ArrayIndex():
12 ms for 500000 searches with 0 mismatches, table elements=100
12 ms for 500000 searches with 0 mismatches, table elements=200
12 ms for 500000 searches with 0 mismatches, table elements=400
15 ms for 500000 searches with 0 mismatches, table elements=800
14 ms for 499200 searches with 0 mismatches, table elements=1600
14 ms for 499200 searches with 0 mismatches, table elements=3200
14 ms for 499200 searches with 0 mismatches, table elements=6400
13 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
24 ms for 500000 searches with 0 mismatches, table elements=100
39 ms for 500000 searches with 0 mismatches, table elements=200
69 ms for 500000 searches with 0 mismatches, table elements=400
125 ms for 500000 searches with 0 mismatches, table elements=800
241 ms for 499200 searches with 0 mismatches, table elements=1600
491 ms for 499200 searches with 0 mismatches, table elements=3200
957 ms for 499200 searches with 0 mismatches, table elements=6400
1868 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---
Title: Re: ArrayIndex timings
Post by: AW on February 06, 2019, 11:45:21 PM
When matrix sizes increase and times may even decrease there is something that does not sound well. Not to talk about the scale of the differences. There are no fat chicken for little money, says a Portuguese proverb.
I did not have a look at what you have done because it is hard for me to dig inside Masm Basic but I will if you don't clarify the miracle.
Title: Re: ArrayIndex timings
Post by: jj2007 on February 07, 2019, 12:21:18 AM
When matrix sizes increase and times may even decrease there is something that does not sound well. Not to talk about the scale of the differences. There are no fat chicken for little money, says a Portuguese proverb.

There is no such thing as a free lunch, says the economist. But the timings for the binary search version are pretty stable, not decreasing. Still, I am also skeptical, in fact I had the same suspicion. But the results are correct, except for the high indices (and I am working on that one if I find the time).
Title: Re: ArrayIndex timings
Post by: guga on February 07, 2019, 12:31:15 AM
AW

the algorithm behaves similar to this technique:

https://academy.realm.io/posts/how-we-beat-cpp-stl-binary-search

I´m testing it here intensively with image processing and it works as expected, except for the last 1 or 2 values on the array (For odd and even sized table). I ported it to the routines i´m using for CieLCH convertion as:

For Dword.
Code: [Select]
Proc BinSearch:
    Arguments @pTable, @Value, @Size
    Uses esi, ecx, edx

    mov edx D@Value
    mov esi D@pTable
    push esi
    mov ecx D@Size

@StartLoop:
    sar ecx 1
    lea eax D$esi+ecx*4 | je @EndSearch
    cmp edx D$eax  | jl @StartLoop
    mov esi eax | jne @StartLoop

@EndSearch:
    pop ecx
    cmp edx D$eax
    setne dl
    sub eax ecx
    sar eax 2

    .Test_If dl dl ; <---- test dl dl | jz L1> (L1 is the macro at ".Test_End")
        If eax >s D@Size ; < cmp eax Size | jng L2>; (L2 = EndIf macro)
            mov ecx D@Size
            mov eax D$esi+ecx*4

        End_If
    .Test_End

EndP

For Real8 it also works, but i´m using his previous version that don´t have that minor problem of the last elements on the array. (Btw...his previous version also works and is amazingly fast)
Title: Re: ArrayIndex timings
Post by: LiaoMi on February 07, 2019, 01:37:24 AM
Code: [Select]
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz

Timings for ArrayIndex():
11 ms for 500000 searches with 0 mismatches, table elements=100
11 ms for 500000 searches with 0 mismatches, table elements=200
12 ms for 500000 searches with 0 mismatches, table elements=400
18 ms for 500000 searches with 0 mismatches, table elements=800
25 ms for 499200 searches with 0 mismatches, table elements=1600
13 ms for 499200 searches with 0 mismatches, table elements=3200
13 ms for 499200 searches with 0 mismatches, table elements=6400
12 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
36 ms for 500000 searches with 0 mismatches, table elements=200
64 ms for 500000 searches with 0 mismatches, table elements=400
122 ms for 500000 searches with 0 mismatches, table elements=800
238 ms for 499200 searches with 0 mismatches, table elements=1600
472 ms for 499200 searches with 0 mismatches, table elements=3200
947 ms for 499200 searches with 0 mismatches, table elements=6400
1887 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---
Title: Re: ArrayIndex timings
Post by: AW on February 07, 2019, 01:39:14 AM
I have both difficulties with Masm Basic and Radasm.
However, things don't appear to make sense. Take this:

@StartLoop:
    sar ecx 1
    lea eax D$esi+ecx*4 | je @EndSearch
    ...
@EndSearch:

which I translate to this:

@StartLoop:
   sar ecx, 1
   lea eax, [esi+4*ecx]
   je @EndSearch
...
@EndSearch:

You divide the size in ecx by 2 then you multiply by 4? Aren't you pointing outside of the range?
If I am wrong, then forget what I said it may be due to my illiteracy in either the MB or Radasm languages.
Title: Re: ArrayIndex timings
Post by: guga on February 07, 2019, 02:01:05 AM
Esi is a pointer to a address, not a value

mov esi D@pTable ; in masm, it is something like:

mov esi Table
(...)
lea eax [esi+ecx*4]

when you divide ecx by 2 you points to the half of the table. Since the table (on this example) is formed by a array of Dwords, you get the address of Table plus X. Ex:

Mytable: dd 1, 2 , 3, 4, 5

Esi = Mytable

To retrieve the value 4, we need to multply the position by 4 (size of each dword). Since 4 is in position 3, to retrieve the value 4 we simply do: MyTable+12 (bytes)

Mytable:
value =1 (MyTable+0 bytes)
value =2 (MyTable+4 bytes)
value =3 (MyTable+8 bytes)
value =4 (MyTable+12 bytes)


It always points inside the range, except for those minor errors in the last 1 or 2 values when the table size is odd or even. (JJ is working on it) :t :t

The Positions on the table are counted from 0 to XX

MyTable: dd 1, 3, 5, 8, 9

Positions are:
0, 1, 2, 3, 4


The algorithm will always search for the nearest values (The smaller one)

So, if you are searching for the value 7, it will return position "2" that corresponds to the value 5 that is the nearest.
Title: Re: ArrayIndex timings
Post by: AW on February 07, 2019, 02:06:03 AM
Well, I saw you loading ecx with the size of the table, or not?

What is this?
mov ecx D@Size

This is the reason I believe you are out of the range.
Title: Re: ArrayIndex timings
Post by: guga on February 07, 2019, 02:08:01 AM
Yes, ecx is the size of the table.

I´m testing it and so far, it is inside the range except those issues i told before.
Title: Re: ArrayIndex timings
Post by: AW on February 07, 2019, 02:12:59 AM
Imagine table size is 256 bytes.
You divide ecx by 2, next you multiply by 4, you get 512.
Since esi points to the table base how:
lea eax, [esi+4*ecx] is in range? How lea eax, [esi+512] is inside the table.
Title: Re: ArrayIndex timings
Post by: guga on February 07, 2019, 02:20:54 AM
A table of 256 bytes, means you have 64 dwords and not 512. (256/4), thus, the total size (amount of elements on the table is 64 and not 256)

Image esi pointing to the address 0401000 (MyTable).  Esi will never change. It´s a fixed address

lea load the effective adress of that DWORD and not a byte.

ecx = 256 bytes = 64 DWORDS

when you divide the size by 2 the resultant value (in DWORDS) are 32.

So, lea [esi+4*ecx] means. esi+4*32 (dwords). So, esi will points to the start of MyTable plus 128 bytes and not 512.




Title: Re: ArrayIndex timings
Post by: guga on February 07, 2019, 02:24:12 AM
The Size on the code is the total amount of elements on the array and not the total size in bytes
Title: Re: ArrayIndex timings
Post by: TBRANSO1 on February 07, 2019, 02:30:06 AM
having played around with data structures and binary search algos in HLL, that one from Guga on the modern 6 core 12 threaded each CPU is ridiculously fast. damn!

Post the testbed, please.

I will do it for it, give me time to review your work so I can replicated it in these languages: Ruby, Python, Java and Crystal Ruby (a C-written, statically type, completely optimized, multi-threaded, compiled version of Ruby, extremely fast, sometimes faster than C, but just as fast as C).  But you know, it's not going to be contest for Ruby, Python and Java... I am going to guess that they could get close to double-digit seconds range for sequences of over 1m elements... not milliseconds, the only one to be somewhat competitive will be Crystal Ruby.
Title: Re: ArrayIndex timings
Post by: AW on February 07, 2019, 03:08:00 AM
Quote
FoundDwords 76 in position 20. True position = 20
FoundDwords 314 in position 91. True position = 91
FoundDwords 2 in position 2. True position = 1
FoundDwords 418 in position 120. True position = 119
FoundDwords 998 in position 255. True position = 255
FoundDwords 684 in position 185. True position = 185
FoundDwords 173 in position 51. True position = 50
FoundDwords 447 in position 133. True position = 133
FoundDwords 205 in position 60. True position = 60
FoundDwords 538 in position 157. True position = 157
FoundDwords 39 in position 11. True position = 11
FoundDwords 516 in position 148. True position = 149
FoundDwords 950 in position 244. True position = 244
FoundDwords 465 in position 136. True position = 135
FoundDwords 27 in position 6. True position = 6
FoundDwords 38 in position 9. True position = 9
FoundDwords 521 in position 152. True position = 152
FoundDwords 319 in position 92. True position = 92
FoundDwords 135 in position 37. True position = 37
FoundDwords 889 in position 230. True position = 230
FoundDwords 313 in position 90. True position = 90
FoundDwords 876 in position 226. True position = 226
FoundDwords 366 in position 108. True position = 108
FoundDwords 303 in position 87. True position = 87
FoundDwords 629 in position 175. True position = 175
FoundDwords 245 in position 70. True position = 70
FoundDwords 344 in position 100. True position = 101
FoundDwords 153 in position 40. True position = 40
FoundDwords 800 in position 211. True position = 211
FoundDwords 521 in position 152. True position = 152
FoundDwords 533 in position 155. True position = 155
FoundDwords 257 in position 76. True position = 76
FoundDwords 240 in position 69. True position = 69
FoundDwords 247 in position 72. True position = 73
FoundDwords 108 in position 31. True position = 31
FoundDwords 303 in position 87. True position = 87
FoundDwords 245 in position 70. True position = 71
FoundDwords 653 in position 179. True position = 179
FoundDwords 27 in position 7. True position = 6
FoundDwords 26 in position 6. True position = 5
FoundDwords 333 in position 95. True position = 95
FoundDwords 489 in position 145. True position = 145
FoundDwords 906 in position 232. True position = 232
FoundDwords 650 in position 177. True position = 177
FoundDwords 833 in position 217. True position = 217
FoundDwords 444 in position 130. True position = 131
FoundDwords 341 in position 99. True position = 99
FoundDwords 861 in position 223. True position = 223
FoundDwords 238 in position 67. True position = 67
FoundDwords 479 in position 141. True position = 140
FoundDwords 516 in position 148. True position = 150
FoundDwords 653 in position 179. True position = 179
FoundDwords 924 in position 239. True position = 239
FoundDwords 511 in position 147. True position = 147
FoundDwords 584 in position 167. True position = 167
FoundDwords 418 in position 120. True position = 119
FoundDwords 398 in position 116. True position = 116
FoundDwords 886 in position 230. True position = 229
FoundDwords 213 in position 61. True position = 61
FoundDwords 96 in position 25. True position = 25
FoundDwords 164 in position 45. True position = 45
FoundDwords 653 in position 178. True position = 178
FoundDwords 29 in position 7. True position = 7
FoundDwords 611 in position 170. True position = 170
FoundDwords 684 in position 185. True position = 185
FoundDwords 478 in position 139. True position = 139
FoundDwords 175 in position 51. True position = 51
FoundDwords 862 in position 224. True position = 224
FoundDwords 423 in position 121. True position = 121
FoundDwords 908 in position 234. True position = 234
FoundDwords 969 in position 247. True position = 247
FoundDwords 516 in position 148. True position = 149
FoundDwords 84 in position 23. True position = 22
FoundDwords 611 in position 170. True position = 170
FoundDwords 51 in position 12. True position = 12
FoundDwords 245 in position 70. True position = 70
FoundDwords 975 in position 248. True position = 248
FoundDwords 478 in position 139. True position = 139
FoundDwords 165 in position 46. True position = 46
FoundDwords 516 in position 148. True position = 149
FoundDwords 788 in position 207. True position = 207
FoundDwords 636 in position 176. True position = 176
FoundDwords 541 in position 158. True position = 158
FoundDwords 772 in position 200. True position = 200
FoundDwords 58 in position 15. True position = 14
FoundDwords 118 in position 33. True position = 33
FoundDwords 551 in position 160. True position = 160
FoundDwords 247 in position 72. True position = 72
FoundDwords 365 in position 107. True position = 107
FoundDwords 247 in position 74. True position = 73
FoundDwords 983 in position 252. True position = 252
FoundDwords 128 in position 34. True position = 34
FoundDwords 335 in position 97. True position = 97
FoundDwords 418 in position 120. True position = 120
FoundDwords 443 in position 129. True position = 129
FoundDwords 473 in position 138. True position = 137
FoundDwords 629 in position 175. True position = 175
FoundDwords 341 in position 98. True position = 98
FoundDwords 435 in position 127. True position = 126
FoundDwords 288 in position 81. True position = 81

I tested 100 times in an array of 256 dwords and your algo appears to work. Yes there are a small number of values that do not match, it needs some investigation (could be due to duplication, I have not forced unique values).
It may also be possible that larger arrays will not diverge significantly in time because maximum number of iterations is about log2(sizeofarray)
Title: Re: ArrayIndex timings
Post by: jj2007 on February 07, 2019, 03:13:24 AM
I tested 100 times in an array of 256 dwords and your algo appears to work.

The testbed checked the values. The only problem here is that it fails for certain table sizes because of the way the shr ecx, 1 is being used. I have little time right now, but I keep working on a variant that works for all sizes but it mysteriously a factor 8 slower. Oh well...
Title: Re: ArrayIndex timings
Post by: AW on February 07, 2019, 04:46:25 AM
I have removed the duplicates caused by the random generation and retested, so duplicates where the reason for mismatches when array sizes are 2^n in size.
When array sizes are not 2^n (for example 3000) in size there are a lot of issues.
The algorithm I provided earlier for doubles does not have those issues though.
I attach the sources, C and ASM, used for testing with doubles using my previous algo and with dwords using this current algo.
Title: Re: ArrayIndex timings
Post by: TimoVJL on February 07, 2019, 05:20:05 AM
I tested that fast_upper_bound3() C version with primes. Still have to check if it works correctly.
Code: [Select]
int fast_upper_bound3(int *vec, int value, int size)
{
    size_t low = 0;
    while (size > 0) {
        size_t half = size / 2;
        size_t other_half = size - half;
        size_t probe = low + half;
        size_t other_low = low + other_half;
        int v = vec[probe];
        size = half;
        //low = v >= value ? other_low : low;
        low = v >= value ? low : other_low;
    }
    return low;
}
Result: Time 23 ms 12800 499200 loops
jj2007 example gave: 27 ms for 499200 searches with 0 mismatches, table elements=12800

msvc 2010 Time 28 ms 12800 499200 loops
msvc 2017 Time 29 ms 12800 499200 loops
pcc32 Time 39 ms 12800 499200 loops
Title: Re: ArrayIndex timings
Post by: LordAdef on February 07, 2019, 09:06:59 AM
Code: [Select]
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

Timings for ArrayIndex():
5653 µs for 500000 searches with 0 mismatches, table elements=100
12 ms for 500000 searches with 0 mismatches, table elements=200
15 ms for 500000 searches with 0 mismatches, table elements=400
18 ms for 500000 searches with 0 mismatches, table elements=800
13 ms for 499200 searches with 0 mismatches, table elements=1600
13 ms for 499200 searches with 0 mismatches, table elements=3200
13 ms for 499200 searches with 0 mismatches, table elements=6400
13 ms for 499200 searches with 0 mismatches, table elements=12800

Timings for repne scasd:
22 ms for 500000 searches with 0 mismatches, table elements=100
38 ms for 500000 searches with 0 mismatches, table elements=200
66 ms for 500000 searches with 0 mismatches, table elements=400
135 ms for 500000 searches with 0 mismatches, table elements=800
257 ms for 499200 searches with 0 mismatches, table elements=1600
480 ms for 499200 searches with 0 mismatches, table elements=3200
1240 ms for 499200 searches with 0 mismatches, table elements=6400
1887 ms for 499200 searches with 0 mismatches, table elements=12800
--- hit any key ---
Title: Re: ArrayIndex timings
Post by: guga on February 07, 2019, 11:44:19 AM
Tks again, AW. But, I´m not sure if i ported it correctly to masmbasic, but, if the porting is correct, the timings are slower compared to JJ´s

60   bytes for BinSearchJ
97   bytes for BinSearchG
0   1.12482723
1   1.43416376
2   1.58823120
3   1.64153149
4   2.28456212
5   3.36804147
6   3.37144616
...
253   99.2858763
254   99.4469785
255   99.6651549
Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
2000000 random searches performed in 175 ms - repnz scasd

BinSearch, mismatches not allowed
2000000 random searches performed in 144 ms
2000000 random searches performed in 134 ms
2000000 random searches performed in 128 ms
2000000 random searches performed in 134 ms
2000000 random searches performed in 134 ms
2000000 random searches performed in 129 ms
2000000 random searches performed in 127 ms
2000000 random searches performed in 129 ms

BinSearch, mismatches allowed
2000000 random searches performed in 160 ms
2000000 random searches performed in 141 ms
2000000 random searches performed in 138 ms
2000000 random searches performed in 149 ms
2000000 random searches performed in 140 ms
2000000 random searches performed in 140 ms
2000000 random searches performed in 147 ms
Title: Re: ArrayIndex timings
Post by: AW on February 07, 2019, 08:21:09 PM
This shall work for arrays of any size.

I have used 2 testing methods and used an array of 12800 elements tested 500,000 times.
The first method is what appears to be the Masm Basic way: Values to test are not random, belong to the array and increase by one in each iteration.
The second method uses random values within the range of the array but may not belong to the array (this corresponds to the requirement, if I understood well).

I am using inline assembly language to simulate what MASM macros do.

Method: No Random.  Table elements: 12800. Elapsed time: 18 milliseconds
Method: Random.  Table elements: 12800. Elapsed time: 46 milliseconds


Code: [Select]
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <Windows.h>

#define CALCULRANDOM 0
#define NUMTESTS 500000
#define NUMELEMENTS 12800
#define MAXIMUMDWORD 50000
#define MAXUPPERMARGIN 5

unsigned int typedef DWORD;

DWORD dwordArray[NUMELEMENTS];

//******************* Build random array ************************
static int comparedwords(const void * a, const void * b)
{
if (*(DWORD*)a > *(DWORD*)b) return 1;
else if (*(DWORD*)a < *(DWORD*)b) return -1;
else return 0;
}

void genDwords()
{
int i;
DWORD dupsFound;
srand(time(NULL));

for (i = 0; i < NUMELEMENTS; i++)
dwordArray[i] = rand() % MAXIMUMDWORD;

do // Remove duplicates
{
qsort(dwordArray, NUMELEMENTS, sizeof(DWORD), comparedwords);
dupsFound = 0;
for (i = 0; i < NUMELEMENTS - 1; i++)
{
if (dwordArray[i] == dwordArray[i + 1])
{
dwordArray[i + 1] = rand() % MAXIMUMDWORD;
dupsFound = 1;
}
}
} while (dupsFound);
}
// ********************************************************

/* Not used
_inline int binarySearchDword(DWORD sortedArray[], int first, int last, DWORD key)
{

while (first <= last) {
int mid = first + (last - first) / 2;

if (key == sortedArray[mid])
return mid;
else if (key < sortedArray[mid])
last = mid - 1;
else {
if (key < sortedArray[mid + 1])
return mid;
else
first = mid + 1;

}
}
return -1;
}
*/

int main()
{
int selectedPosition=0;
DWORD selectDword;
int retpos;
int testNum;
LARGE_INTEGER frequency;
LARGE_INTEGER t1, t2;
double elapsedTime;

genDwords();  // Generate random array of unique dwords

QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&t1);
for (testNum = 0; testNum < NUMTESTS; testNum++)
{
#if CALCULRANDOM==0
selectDword = dwordArray[selectedPosition];

#elif CALCULRANDOM==1
selectedPosition = rand() % NUMELEMENTS;
selectDword = dwordArray[selectedPosition];
selectDword += rand() % MAXUPPERMARGIN;
if (selectDword >= dwordArray[selectedPosition + 1])
selectDword = dwordArray[selectedPosition];
#endif
_asm
{
push esi
push edi
push ebx
mov esi, NUMELEMENTS - 1
mov ecx, 0
mov ebx, selectDword
startLoop:
mov eax, esi
sub eax, ecx
cdq
sub eax, edx
lea edx, dwordArray
sar eax, 1
add eax, ecx
mov edx, dword ptr[edx + eax * 4]
cmp ebx, edx
je short EndSearch
jae short checkElse
lea esi, [eax - 1]
jmp short EndLoop
checkElse:
lea ecx, dwordArray
cmp ebx, [ecx + eax * 4 + 4]
jb short EndSearch
lea ecx, [eax + 1]
EndLoop:
cmp ecx, esi
jle short startLoop
or eax, -1
EndSearch:
mov retpos, eax
pop ebx
pop edi
pop esi
}

//retpos = binarySearchDword(dwordArray, 0, NUMELEMENTS - 1, selectDword);
if (retpos != selectedPosition)
printf("****************FOUND MISMATCH ********************\n");
#if CALCULRANDOM==0
selectedPosition = ++selectedPosition % NUMELEMENTS;
#endif
}
QueryPerformanceCounter(&t2);
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
printf("Method: %s.  Table elements: %d. Elapsed time: %0.0f milliseconds\n", (CALCULRANDOM)?"Random":"No Random", NUMELEMENTS, elapsedTime);
return 0;
}
Title: Re: ArrayIndex timings
Post by: jj2007 on February 07, 2019, 09:59:26 PM
There is also the CRT bsearch: mov eax, rv(crt_bsearch, addr tmpDW, addr table(0), 256, DWORD, addr cmpFunction)

It would be helpful if alternative algos were posted as somealgo proc uses esi edi pTable, matchtofind, sizeoftable
Title: Re: ArrayIndex timings
Post by: guga on February 08, 2019, 01:13:03 AM
Hi Aw

Tks:) Looks good, but a few problems.

1) Only works with Dwords and not Real8
2) Only works with positive numbers
3) It´s returning -1 whenever the value is outside the ranger when the necessary was that, on this situation it look for it´s extremes. Ex: Array  = 17,35, 44, 66, 99, 120 . If value to find = 5, it should return 0 in eax (The starting index/pos). If the value to find is 200, it should return 5 (The last index/pos)


Jj
Quote
It would be helpful if alternative algos were posted as somealgo proc uses esi edi pTable, matchtofind, sizeoftable

Totally agree :t
Title: Re: ArrayIndex timings
Post by: LiaoMi on February 08, 2019, 01:44:49 AM
Quote
Here are the names of search implementations:

binary_search_std: direct call to std::lower_bound
binary_search_simple: basic binary search with branches
binary_search_branchless: branchless binary search
binary_search_branchless_UR: branchless binary search – fully unrolled
linearX_search_sse: linear search with break (SSE)
linear_search_sse: counting linear search (SSE)
linear_search_sse_UR: counting linear search (SSE) – fully unrolled
linear_search_avx: counting linear search (AVX)
linear_search_avx_UR: counting linear search (AVX) – fully unrolled

(https://dirtyhandscoding.files.wordpress.com/2017/08/plot_search_655361.png)

https://dirtyhandscoding.wordpress.com/2017/08/25/performance-comparison-linear-search-vs-binary-search/ (https://dirtyhandscoding.wordpress.com/2017/08/25/performance-comparison-linear-search-vs-binary-search/)

+

https://stackoverflow.com/questions/18971401/sparse-array-compression-using-simd-avx2 (https://stackoverflow.com/questions/18971401/sparse-array-compression-using-simd-avx2)
Title: Re: ArrayIndex timings
Post by: AW on February 08, 2019, 04:08:10 AM
Hi Aw

Tks:) Looks good, but a few problems.

1) Only works with Dwords and not Real8
2) Only works with positive numbers
3) It´s returning -1 whenever the value is outside the ranger when the necessary was that, on this situation it look for it´s extremes. Ex: Array  = 17,35, 44, 66, 99, 120 . If value to find = 5, it should return 0 in eax (The starting index/pos). If the value to find is 200, it should return 5 (The last index/pos)

1) I have done the Real8 a few days ago.
2) We can generate an array starting where we want, not necessarily in 0. The code makes signed compares, so it will probably work (I believe so, not confirmed).
3) I don't understand what you mean but it should be something pretty easy.  :biggrin:. Anyway, I believe it already does what you want because when the value is above it selects the immediately smaller value in the array. If the value is below the smaller value in the array there is an error indeed but it is easy to cover that case.



Title: Re: ArrayIndex timings
Post by: AW on February 08, 2019, 04:15:48 AM
@LiaoMi

If data is already sorted, except for very small arrays, binary search is king. Of course, there are binary search variations, some will work better with some systems other with other systems. One thing I don't believe, because I have never seen, is the GCC produce faster code for Windows than Visual Studio. This never happen unless we don't know how to configure VS.
Sure we can test but not with algorithms provided by C++ classes and stock libraries. And when I refer C++, I could refer C# or Java. I have nothing against classes, actually I work a lot with Delphi which is based in classes, but I know Delphi is not an example of extreme performance.
Title: Re: ArrayIndex timings
Post by: jj2007 on February 08, 2019, 12:47:44 PM
The bug is fixed, please reinstall (http://masm32.com/board/index.php?topic=94.0) before building the attached source.

Below a simple graph showing the time needed with ArrayIndex(table, find) over the logarithm of the table size.
Title: Re: ArrayIndex timings
Post by: jj2007 on February 08, 2019, 10:42:43 PM
New testbed including CRT bsearch. Note the timings for repne scasd are for a reduced table, 543 instead of 54321 elements.
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

2000000 random searches performed in 407 ms with repnz scasd for a 543 elements table
2000000 random searches performed in 391 ms with bsearch for a 54321 elements table

ArrayIndex, mismatches not allowed
2000000 random searches performed in 174 ms
2000000 random searches performed in 175 ms
2000000 random searches performed in 173 ms
2000000 random searches performed in 174 ms
2000000 random searches performed in 173 ms

ArrayIndex, mismatches allowed
2000000 random searches performed in 204 ms
2000000 random searches performed in 202 ms
2000000 random searches performed in 203 ms
2000000 random searches performed in 203 ms
2000000 random searches performed in 203 ms

In case somebody wants to test it with non-MasmBasic code, the attached macro allows mov xx, ArrayIndex(pArray, match, lengthof Array):
Code: [Select]
include \masm32\include\masm32rt.inc ; almost pure Masm32 code
include ArrayIndex.mac ; not for distribution

.data
MyArray dd -2, -1, 0, 1, 11h, 12h, 13h, 14h, 15h, 16h, 22h, 23h, 25h, 51, 53, 37h, 44h, 45h, 46h, 55h, 66h, 77h, 88h

.code
start:
  xor ebx, ebx
  .Repeat
mov edi, MyArray[4*ebx] ; get a value
print "element #"
print str$(ebx), 9, "value "
print str$(edi), "  ", 9, " was found at pos "
print str$(ArrayIndex(offset MyArray, edi, lengthof MyArray)), 13, 10
inc ebx
 .Until ebx>=lengthof MyArray
 inkey "--- hit any key ---"
  exit

end start
Title: Re: ArrayIndex timings
Post by: guga on February 08, 2019, 11:16:54 PM
Tks a lot, JJ.

One question. How do i enabled to test the Real8 mode ?  And i saw on the code there is a qword too, how to enable them ?
Title: Re: ArrayIndex timings
Post by: LiaoMi on February 09, 2019, 04:02:59 AM
 :t

Code: [Select]
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz

2000000 random searches performed in 361 ms with repnz scasd for a 543 elements table
2000000 random searches performed in 301 ms with bsearch for a 54321 elements table

ArrayIndex, mismatches not allowed
2000000 random searches performed in 143 ms
2000000 random searches performed in 144 ms
2000000 random searches performed in 143 ms
2000000 random searches performed in 146 ms
2000000 random searches performed in 145 ms

ArrayIndex, mismatches allowed
2000000 random searches performed in 166 ms
2000000 random searches performed in 167 ms
2000000 random searches performed in 172 ms
2000000 random searches performed in 168 ms
2000000 random searches performed in 169 ms
--- hit any key ---
Title: Re: ArrayIndex timings
Post by: AW on February 09, 2019, 04:19:56 AM
4 billion tests on a 500,000 elements array!
Now, it works according to Guga's latest specification (for dwords).
I will not spend more time with this, someone will do the real 8, for sure.

***** Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz *****

Number of Tests:4000000, Table Size:500000
Highest Positive Dword:300000, Lowest Negative Dword:-300000
Mismatches not allowed/Sequential Keys
Elapsed time: 177 miliseconds

Number of Tests:4000000, Table Size:500000
Highest Positive Dword:300000, Lowest Negative Dword:-300000
Mismatches allowed/Ramdom Keys, Max. of Random Variation from Table value: 500
Elapsed time: 483 miliseconds
Title: Re: ArrayIndex timings
Post by: jj2007 on February 09, 2019, 06:46:16 AM
One question. How do i enabled to test the Real8 mode ?  And i saw on the code there is a qword too, how to enable them ?

The macro is intelligent: if you pass a REAL8 number to search, it knows what to do with it 8)

The QWORD code won't work, it has a little bug, and I am too tired / not interested enough to fix it. One problem is that comisd is a great instruction, but it reacts allergic against certain QWORDs, saying "that's Not A Number!". The solution is obviously to split the comparison in hi dword, then lo dword, but right now I want to drink a glass of good red wine instead ;)
Title: Re: ArrayIndex timings
Post by: guga on February 09, 2019, 08:10:33 AM
I´ll try on it :) Amazing. many tks :)


(https://static.vinepair.com/wp-content/uploads/2018/09/red-wine-header.jpg)
Title: Re: ArrayIndex timings
Post by: guga on February 09, 2019, 11:14:27 AM
Hi JJ

Found a small problem on the Real8 function. When the number to search is a bit different  from the one in table (problem with rounding mode, for example) it returns the previous position.

For example i want to search for the number 100, but due to the rounding mode or convertion using  FPU operations, it may generates something that is not 00000000h 04590000h .

Ex:
NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh . This also is the number 100 in Real8

MyTable: 1, 19, 21, 39, 41, 55, 90, 100

It will return position 6, instead of 7.


Also...using "comisd ... ja" it is looking only for numbers that are positive, right ?
Title: Re: ArrayIndex timings
Post by: jj2007 on February 09, 2019, 12:27:20 PM
Hi Guga,
In spite of the ja, it works fine with negative numbers (and it fails miserably with jg ;)).

Re "This also is the number 100 in Real8":

Code: [Select]
;   NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh
  push 04058FFFFh
  push 0FFFFFFFEh      ; even 0FFFFFFFFh will not produce a 100.0
  movlps xmm0, qword ptr [esp]
  fld qword ptr [esp]
  deb 4, "rounding error...", f:xmm0, ST(0)
Output:
Code: [Select]
rounding error...
f:xmm0          99.99999999999997
ST(0)           99.99999999999997158
Title: Re: ArrayIndex timings
Post by: guga on February 09, 2019, 01:05:38 PM
Hmm..indeed.

It seems that there is a bug in RosAsm float to string :shock:. On the debugger it is rounding and most likely on the assembler too. I´ll take a look and test. The same thing seems to happens with negative numbers.

Thanks a lot :t :t :t :t
Title: Re: ArrayIndex timings
Post by: AW on February 09, 2019, 01:53:21 PM
Hi Guga,
In spite of the ja, it works fine with negative numbers (and it fails miserably with jg ;)).

Re "This also is the number 100 in Real8":

Code: [Select]
;   NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh
  push 04058FFFFh
  push 0FFFFFFFEh      ; even 0FFFFFFFFh will not produce a 100.0
  movlps xmm0, qword ptr [esp]
  fld qword ptr [esp]
  deb 4, "rounding error...", f:xmm0, ST(0)
Output:
Code: [Select]
rounding error...
f:xmm0          99.99999999999997
ST(0)           99.99999999999997158
Why don't you use movlpd if you are dealing with real8?
Title: Re: ArrayIndex timings
Post by: jj2007 on February 09, 2019, 02:07:22 PM
Real number precision is a can of worms :(

The deb (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1019) macro tries to use the optimal number of digits depending on the size:

Code: [Select]
  fldpi
  fst tmp4  ; REAL4
  fst tmp8  ; REAL8
  fstp tmp10
  deb 4, "test", tmp4, tmp8, tmp10

Output:
Code: [Select]
test
tmp4            3.141593
tmp8            3.141592653589793
tmp10           3.141592653589793238

Re movlpd, it's one byte more and no difference speed-wise, otherwise no difference. Mike Popoloski writes in Demystifying SSE Move Instructions (https://www.gamedev.net/blogs/entry/2250281-demystifying-sse-move-instructions/):

Quote
movaps, movapd, and movdqa will have the same delay no matter what data you use. Since movaps (and movups) is encoded in binary form with one less byte than the other two, it makes sense to use it for all reg-mem moves, regardless of the data type

See also https://stackoverflow.com/questions/36048502/sse-instruction-movsd-extended-floating-point-scalar-vector-operations-on-x8
Title: Re: ArrayIndex timings
Post by: guga on February 10, 2019, 07:26:29 AM
Quote
Real number precision is a can of worms :(

Indeed. I´m having to make a hard choice right now to what is the best approach on RosAsm´s debugger. The numbers are rounded to the nearest (As in OllyDbg and Idapro) but, perhaps, it would be better to keep the numbers precise (For the debugger, mainly). The FPU in RosAsm was last updated more then a decade ago and i didn´t updated it ever since. The routines for the assembler, disassembler and debugger are similar to each other, so i´m thinking what would be the best choice. Maybe creating a option to allow the user round the FPU or not (Specially for the disassembler when facing numbers such as: 99.99999999999997. We chosen to round those kind of numbers for the disassembler but, it maybe good to made this as an user option and not an automated mode.

Thinking :icon_rolleyes: :icon_rolleyes:
Title: Re: ArrayIndex timings
Post by: daydreamer on February 10, 2019, 08:19:25 AM
Quote
Real number precision is a can of worms :(

Indeed. I´m having to make a hard choice right now to what is the best approach on RosAsm´s debugger. The numbers are rounded to the nearest (As in OllyDbg and Idapro) but, perhaps, it would be better to keep the numbers precise (For the debugger, mainly). The FPU in RosAsm was last updated more then a decade ago and i didn´t updated it ever since. The routines for the assembler, disassembler and debugger are similar to each other, so i´m thinking what would be the best choice. Maybe creating a option to allow the user round the FPU or not (Specially for the disassembler when facing numbers such as: 99.99999999999997. We chosen to round those kind of numbers for the disassembler but, it maybe good to made this as an user option and not an automated mode.

Thinking :icon_rolleyes: :icon_rolleyes:
Have to work on regular If macros later,because I want to make special macros for cielab conversions first + other macros for calculations
So first I want to make macro is for check grey pixel is between two values in table that is without jump,so you have freedom to use conditional mov or conditional jump



Title: Timings please
Post by: jj2007 on February 10, 2019, 11:28:16 AM
Hi everybody,

I am testing two slightly different versions of my binary search algos. Can I please have some timings, especially also on AMD, on older, mobile or otherwise exotic CPUs? Thanks :icon14:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
461 ms for 1000000 matches - integer version A
497 ms for 1000000 matches - integer version B

564 ms for 1000000 matches - double version A
581 ms for 1000000 matches - double version B

460 ms for 1000000 matches - integer version A
457 ms for 1000000 matches - integer version B

578 ms for 1000000 matches - double version A
569 ms for 1000000 matches - double version B

452 ms for 1000000 matches - integer version A
455 ms for 1000000 matches - integer version B

586 ms for 1000000 matches - double version A
611 ms for 1000000 matches - double version B

474 ms for 1000000 matches - integer version A
448 ms for 1000000 matches - integer version B

588 ms for 1000000 matches - double version A
575 ms for 1000000 matches - double version B

483 ms for 1000000 matches - integer version A
465 ms for 1000000 matches - integer version B

610 ms for 1000000 matches - double version A
587 ms for 1000000 matches - double version B

P.S.: During the first run it builds two databases. The sorting may take about 5-10 seconds.
Title: Re: ArrayIndex timings
Post by: guga on February 10, 2019, 04:33:34 PM
Hi Jochen :t

Code: [Select]
Building two 10 Mio elements databases...

Intel(R) Core(TM) i7 CPU         870  @ 2.93GHz
312 ms for 1000000 matches - integer version A
337 ms for 1000000 matches - integer version B

420 ms for 1000000 matches - double version A
431 ms for 1000000 matches - double version B

332 ms for 1000000 matches - integer version A
309 ms for 1000000 matches - integer version B

427 ms for 1000000 matches - double version A
423 ms for 1000000 matches - double version B

307 ms for 1000000 matches - integer version A
318 ms for 1000000 matches - integer version B

428 ms for 1000000 matches - double version A
418 ms for 1000000 matches - double version B

314 ms for 1000000 matches - integer version A
314 ms for 1000000 matches - integer version B

422 ms for 1000000 matches - double version A
411 ms for 1000000 matches - double version B

314 ms for 1000000 matches - integer version A
328 ms for 1000000 matches - integer version B

416 ms for 1000000 matches - double version A
423 ms for 1000000 matches - double version B

--- hit any key ---

About the problem on RosAsm debugger. The error was caused by a FPU_EXCEPTION_PRECISION  on the control word of the FPU. It do returns the correct (?) number  99.9999999999999714 but due to that exception it was rounding only on the buffer that stored the string that converted the FPU to Ascii and not the FPU value itself.

I´m in doubt what  to do because due about those kind of exceptions. I can consider the number as 100, anyway and force it to become 100, right ? It seems to be on the margin of error, though
Title: Re: ArrayIndex timings
Post by: Siekmanski on February 10, 2019, 05:03:01 PM
Code: [Select]
Building two 10 Mio elements databases...

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
348 ms for 1000000 matches - integer version A
302 ms for 1000000 matches - integer version B

393 ms for 1000000 matches - double version A
410 ms for 1000000 matches - double version B

307 ms for 1000000 matches - integer version A
285 ms for 1000000 matches - integer version B

410 ms for 1000000 matches - double version A
396 ms for 1000000 matches - double version B

285 ms for 1000000 matches - integer version A
291 ms for 1000000 matches - integer version B

417 ms for 1000000 matches - double version A
426 ms for 1000000 matches - double version B

298 ms for 1000000 matches - integer version A
316 ms for 1000000 matches - integer version B

445 ms for 1000000 matches - double version A
405 ms for 1000000 matches - double version B

317 ms for 1000000 matches - integer version A
297 ms for 1000000 matches - integer version B

418 ms for 1000000 matches - double version A
429 ms for 1000000 matches - double version B
Title: Re: ArrayIndex timings
Post by: daydreamer on February 10, 2019, 05:25:40 PM
Guga maybe think like an engineer,put measurement 100+-0.1 on blueprint
So the workers know how precision is to let metal bits thru or throw away because it's became too big (>100.01) or too small (<99.99)

Title: Re: ArrayIndex timings
Post by: TimoVJL on February 10, 2019, 08:18:33 PM
Code: [Select]
Building two 10 Mio elements databases...

AMD Athlon(tm) II X2 220 Processor
889 ms for 1000000 matches - integer version A
799 ms for 1000000 matches - integer version B

934 ms for 1000000 matches - double version A
935 ms for 1000000 matches - double version B

795 ms for 1000000 matches - integer version A
795 ms for 1000000 matches - integer version B

934 ms for 1000000 matches - double version A
932 ms for 1000000 matches - double version B

807 ms for 1000000 matches - integer version A
816 ms for 1000000 matches - integer version B

936 ms for 1000000 matches - double version A
929 ms for 1000000 matches - double version B

810 ms for 1000000 matches - integer version A
811 ms for 1000000 matches - integer version B

933 ms for 1000000 matches - double version A
927 ms for 1000000 matches - double version B

805 ms for 1000000 matches - integer version A
795 ms for 1000000 matches - integer version B

931 ms for 1000000 matches - double version A
943 ms for 1000000 matches - double version B

--- hit any key ---
Title: Re: ArrayIndex timings
Post by: Siekmanski on February 10, 2019, 08:22:57 PM
Guga maybe think like an engineer,put measurement 100+-0.1 on blueprint
So the workers know how precision is to let metal bits thru or throw away because it's became too big (>100.01) or too small (<99.99)

Maybe engineers can get some gain and precision with precalculated coefficients.....  8)
Title: Re: ArrayIndex timings
Post by: guga on February 11, 2019, 01:20:23 AM
Siekmanski  :bgrin: :bgrin: :bgrin:

Hi DayDreamer

The problem is that the precision is Ok, it is converting the FPU to 17 digits (considering the dot). The problem is that the functions to convert FPU to String are old and i´m having to review them since it is generating a bug in both Debugger and Disassembler. It was an adaptation of Raymond´s function made by René (the original author), but as i told, some numbers where supposed to be rounded only when the number reaches the 16-17th final digit ending with either "0" to "9" but, for some odd reason it is considering the last 3 digits and to get things worst, when the generated exponent is negative it tries to compute the maximum exponent (similar to the scientific routine in Raymond´s work) but it is giving a error because the generated exponent is turning negative and  i´m not being able to find how to fix that yet.

The main problem of rounding a Real8 or TenByte (real10) is that 99.999999997145 is still not 100 ! But i could, however, consider (for the disassembler and debugger routines)  99.999999999995 or 99.999999999991 or 99.999999999999 as 100 or should i keep those numbers precise disregarding the margin of error in the FPU itself ? This is the kind of thing i´m trying to decide what to do, because on Olly and ida they have different approaches on this sort of thing. (While ollydbg seems to round it anyway (as we did in rosasm), Idapro considers the number more precise including the margin of error (and this caused another error with numbers being converted as 2.546465 e314 for example, when the correct was other value).
Title: Re: ArrayIndex timings
Post by: guga on February 11, 2019, 04:36:54 AM
Ok, guys, fixed one part of RosAsm FPU for the disassembler mode. It correctly outputs the numbers with a bit of more precision then Idapro

Quote
RosAsm
[Data04090B8: F$ 1.84467440737095516e+19]
[Data04090BC: R$ -5.42101086242752217e-20]

Idapro

flt_4090B8      dd 1.8446744e19
dbl_4090BC      dq -5.421010862427522e-20

It still have a few more things to fix before i decide what is the proper method to handle those roundings.  :icon_rolleyes: :icon_rolleyes:
Title: Re: Timings please
Post by: FORTRANS on February 12, 2019, 01:19:39 AM
Hi Jochen,

Hi everybody,

I am testing two slightly different versions of my binary search algos. Can I please have some timings, especially also on AMD, on older, mobile or otherwise exotic CPUs? Thanks :icon14:

   It ,of course, did not run on the P-III.  So, the next oldest still working.

Code: [Select]
F:\TEMP\TEST>ARRAYIND
Building two 10 Mio elements databases...

Intel(R) Pentium(R) M processor 1.70GHz
1111 ms for 1000000 matches - integer version A
990 ms for 1000000 matches - integer version B

1134 ms for 1000000 matches - double version A
1148 ms for 1000000 matches - double version B

989 ms for 1000000 matches - integer version A
990 ms for 1000000 matches - integer version B

1131 ms for 1000000 matches - double version A
1137 ms for 1000000 matches - double version B

993 ms for 1000000 matches - integer version A
986 ms for 1000000 matches - integer version B

1203 ms for 1000000 matches - double version A
1152 ms for 1000000 matches - double version B

1009 ms for 1000000 matches - integer version A
1009 ms for 1000000 matches - integer version B

1159 ms for 1000000 matches - double version A
1157 ms for 1000000 matches - double version B

1007 ms for 1000000 matches - integer version A
1008 ms for 1000000 matches - integer version B

1133 ms for 1000000 matches - double version A
1162 ms for 1000000 matches - double version B

--- hit any key ---

HTH,

Steve N.