ArrayIndex timings

aw27 · February 08, 2019, 04:08:10 AM

Quote from: guga on February 08, 2019, 01:13:03 AM
Hi Aw

Tks:) Looks good, but a few problems.

1) Only works with Dwords and not Real8
2) Only works with positive numbers
3) It´s returning -1 whenever the value is outside the ranger when the necessary was that, on this situation it look for it´s extremes. Ex: Array = 17,35, 44, 66, 99, 120 . If value to find = 5, it should return 0 in eax (The starting index/pos). If the value to find is 200, it should return 5 (The last index/pos)

1) I have done the Real8 a few days ago.
2) We can generate an array starting where we want, not necessarily in 0. The code makes signed compares, so it will probably work (I believe so, not confirmed).
3) I don't understand what you mean but it should be something pretty easy.

. Anyway, I believe it already does what you want because when the value is above it selects the immediately smaller value in the array. If the value is below the smaller value in the array there is an error indeed but it is easy to cover that case.

aw27 · February 08, 2019, 04:15:48 AM

@LiaoMi

If data is already sorted, except for very small arrays, binary search is king. Of course, there are binary search variations, some will work better with some systems other with other systems. One thing I don't believe, because I have never seen, is the GCC produce faster code for Windows than Visual Studio. This never happen unless we don't know how to configure VS.
Sure we can test but not with algorithms provided by C++ classes and stock libraries. And when I refer C++, I could refer C# or Java. I have nothing against classes, actually I work a lot with Delphi which is based in classes, but I know Delphi is not an example of extreme performance.

jj2007 · February 08, 2019, 12:47:44 PM

The bug is fixed, please reinstall before building the attached source.

Below a simple graph showing the time needed with ArrayIndex(table, find) over the logarithm of the table size.

jj2007 · February 08, 2019, 10:42:43 PM

New testbed including CRT bsearch. Note the timings for repne scasd are for a reduced table, 543 instead of 54321 elements.

Code Select

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

2000000 random searches performed in 407 ms with repnz scasd for a 543 elements table
2000000 random searches performed in 391 ms with bsearch for a 54321 elements table

ArrayIndex, mismatches not allowed
2000000 random searches performed in 174 ms
2000000 random searches performed in 175 ms
2000000 random searches performed in 173 ms
2000000 random searches performed in 174 ms
2000000 random searches performed in 173 ms

ArrayIndex, mismatches allowed
2000000 random searches performed in 204 ms
2000000 random searches performed in 202 ms
2000000 random searches performed in 203 ms
2000000 random searches performed in 203 ms
2000000 random searches performed in 203 ms

In case somebody wants to test it with non-MasmBasic code, the attached macro allows mov xx, ArrayIndex(pArray, match, lengthof Array):

Code Select

include \masm32\include\masm32rt.inc	; almost pure Masm32 code
include ArrayIndex.mac	; not for distribution

.data
MyArray dd -2, -1, 0, 1, 11h, 12h, 13h, 14h, 15h, 16h, 22h, 23h, 25h, 51, 53, 37h, 44h, 45h, 46h, 55h, 66h, 77h, 88h

.code
start:
  xor ebx, ebx
  .Repeat
	mov edi, MyArray[4*ebx]	; get a value
	print "element #"
	print str$(ebx), 9, "value "
	print str$(edi), "  ", 9, " was found at pos "
	print str$(ArrayIndex(offset MyArray, edi, lengthof MyArray)), 13, 10
	inc ebx
 .Until ebx>=lengthof MyArray
 inkey "--- hit any key ---"
  exit

end start

guga · February 08, 2019, 11:16:54 PM

Tks a lot, JJ.

One question. How do i enabled to test the Real8 mode ? And i saw on the code there is a qword too, how to enable them ?

LiaoMi · February 09, 2019, 04:02:59 AM

:t

Code Select

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz

2000000 random searches performed in 361 ms with repnz scasd for a 543 elements table
2000000 random searches performed in 301 ms with bsearch for a 54321 elements table

ArrayIndex, mismatches not allowed
2000000 random searches performed in 143 ms
2000000 random searches performed in 144 ms
2000000 random searches performed in 143 ms
2000000 random searches performed in 146 ms
2000000 random searches performed in 145 ms

ArrayIndex, mismatches allowed
2000000 random searches performed in 166 ms
2000000 random searches performed in 167 ms
2000000 random searches performed in 172 ms
2000000 random searches performed in 168 ms
2000000 random searches performed in 169 ms
--- hit any key ---

aw27 · February 09, 2019, 04:19:56 AM

4 billion tests on a 500,000 elements array!
Now, it works according to Guga's latest specification (for dwords).
I will not spend more time with this, someone will do the real 8, for sure.

***** Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz *****

Number of Tests:4000000, Table Size:500000
Highest Positive Dword:300000, Lowest Negative Dword:-300000
Mismatches not allowed/Sequential Keys
Elapsed time: 177 miliseconds

Number of Tests:4000000, Table Size:500000
Highest Positive Dword:300000, Lowest Negative Dword:-300000
Mismatches allowed/Ramdom Keys, Max. of Random Variation from Table value: 500
Elapsed time: 483 miliseconds

jj2007 · February 09, 2019, 06:46:16 AM

Quote from: guga on February 08, 2019, 11:16:54 PMOne question. How do i enabled to test the Real8 mode ? And i saw on the code there is a qword too, how to enable them ?

The macro is intelligent: if you pass a REAL8 number to search, it knows what to do with it 8)

The QWORD code won't work, it has a little bug, and I am too tired / not interested enough to fix it. One problem is that comisd is a great instruction, but it reacts allergic against certain QWORDs, saying "that's Not A Number!". The solution is obviously to split the comparison in hi dword, then lo dword, but right now I want to drink a glass of good red wine instead ;)

guga · February 09, 2019, 08:10:33 AM

I´ll try on it :) Amazing. many tks :)

guga · February 09, 2019, 11:14:27 AM

Hi JJ

Found a small problem on the Real8 function. When the number to search is a bit different from the one in table (problem with rounding mode, for example) it returns the previous position.

For example i want to search for the number 100, but due to the rounding mode or convertion using FPU operations, it may generates something that is not 00000000h 04590000h .

Ex:
NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh . This also is the number 100 in Real8

MyTable: 1, 19, 21, 39, 41, 55, 90, 100

It will return position 6, instead of 7.

Also...using "comisd ... ja" it is looking only for numbers that are positive, right ?

jj2007 · February 09, 2019, 12:27:20 PM

Hi Guga,
In spite of the ja, it works fine with negative numbers (and it fails miserably with jg ;)).

Re "This also is the number 100 in Real8":

Code Select

;   NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh
  push 04058FFFFh
  push 0FFFFFFFEh      ; even 0FFFFFFFFh will not produce a 100.0
  movlps xmm0, qword ptr [esp]
  fld qword ptr [esp]
  deb 4, "rounding error...", f:xmm0, ST(0)

Output:

Code Select

rounding error...
f:xmm0          99.99999999999997
ST(0)           99.99999999999997158

guga · February 09, 2019, 01:05:38 PM

Hmm..indeed.

It seems that there is a bug in RosAsm float to string :shock:. On the debugger it is rounding and most likely on the assembler too. I´ll take a look and test. The same thing seems to happens with negative numbers.

Thanks a lot :t :t :t :t

aw27 · February 09, 2019, 01:53:21 PM

Quote from: jj2007 on February 09, 2019, 12:27:20 PM
Hi Guga,
In spite of the ja, it works fine with negative numbers (and it fails miserably with jg ;)).

Re "This also is the number 100 in Real8":

Code Select Expand
; NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh push 04058FFFFh push 0FFFFFFFEh ; even 0FFFFFFFFh will not produce a 100.0 movlps xmm0, qword ptr [esp] fld qword ptr [esp] deb 4, "rounding error...", f:xmm0, ST(0)
Output:
Code Select Expand
rounding error... f:xmm0 99.99999999999997 ST(0) 99.99999999999997158

Why don't you use movlpd if you are dealing with real8?

jj2007 · February 09, 2019, 02:07:22 PM

Real number precision is a can of worms :(

The deb macro tries to use the optimal number of digits depending on the size:

Code Select

  fldpi
  fst tmp4  ; REAL4
  fst tmp8  ; REAL8
  fstp tmp10
  deb 4, "test", tmp4, tmp8, tmp10

Output:

Code Select

test
tmp4            3.141593
tmp8            3.141592653589793
tmp10           3.141592653589793238

Re movlpd, it's one byte more and no difference speed-wise, otherwise no difference. Mike Popoloski writes in Demystifying SSE Move Instructions:

Quotemovaps, movapd, and movdqa will have the same delay no matter what data you use. Since movaps (and movups) is encoded in binary form with one less byte than the other two, it makes sense to use it for all reg-mem moves, regardless of the data type

See also https://stackoverflow.com/questions/36048502/sse-instruction-movsd-extended-floating-point-scalar-vector-operations-on-x8

guga · February 10, 2019, 07:26:29 AM

QuoteReal number precision is a can of worms :(

Indeed. I´m having to make a hard choice right now to what is the best approach on RosAsm´s debugger. The numbers are rounded to the nearest (As in OllyDbg and Idapro) but, perhaps, it would be better to keep the numbers precise (For the debugger, mainly). The FPU in RosAsm was last updated more then a decade ago and i didn´t updated it ever since. The routines for the assembler, disassembler and debugger are similar to each other, so i´m thinking what would be the best choice. Maybe creating a option to allow the user round the FPU or not (Specially for the disassembler when facing numbers such as: 99.99999999999997. We chosen to round those kind of numbers for the disassembler but, it maybe good to made this as an user option and not an automated mode.

Thinking :icon_rolleyes: :icon_rolleyes:

The MASM Forum

News:

ArrayIndex timings

aw27

aw27

jj2007

jj2007

guga

LiaoMi

aw27

jj2007

guga

guga

jj2007

guga

aw27

jj2007

guga