News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

ArrayIndex timings

Started by jj2007, February 06, 2019, 01:12:52 AM

Previous topic - Next topic

aw27

Quote from: guga on February 08, 2019, 01:13:03 AM
Hi Aw

Tks:) Looks good, but a few problems.

1) Only works with Dwords and not Real8
2) Only works with positive numbers
3) It´s returning -1 whenever the value is outside the ranger when the necessary was that, on this situation it look for it´s extremes. Ex: Array  = 17,35, 44, 66, 99, 120 . If value to find = 5, it should return 0 in eax (The starting index/pos). If the value to find is 200, it should return 5 (The last index/pos)

1) I have done the Real8 a few days ago.
2) We can generate an array starting where we want, not necessarily in 0. The code makes signed compares, so it will probably work (I believe so, not confirmed).
3) I don't understand what you mean but it should be something pretty easy.  :biggrin:. Anyway, I believe it already does what you want because when the value is above it selects the immediately smaller value in the array. If the value is below the smaller value in the array there is an error indeed but it is easy to cover that case.




aw27

@LiaoMi

If data is already sorted, except for very small arrays, binary search is king. Of course, there are binary search variations, some will work better with some systems other with other systems. One thing I don't believe, because I have never seen, is the GCC produce faster code for Windows than Visual Studio. This never happen unless we don't know how to configure VS.
Sure we can test but not with algorithms provided by C++ classes and stock libraries. And when I refer C++, I could refer C# or Java. I have nothing against classes, actually I work a lot with Delphi which is based in classes, but I know Delphi is not an example of extreme performance.

jj2007

The bug is fixed, please reinstall before building the attached source.

Below a simple graph showing the time needed with ArrayIndex(table, find) over the logarithm of the table size.

jj2007

New testbed including CRT bsearch. Note the timings for repne scasd are for a reduced table, 543 instead of 54321 elements.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz

2000000 random searches performed in 407 ms with repnz scasd for a 543 elements table
2000000 random searches performed in 391 ms with bsearch for a 54321 elements table

ArrayIndex, mismatches not allowed
2000000 random searches performed in 174 ms
2000000 random searches performed in 175 ms
2000000 random searches performed in 173 ms
2000000 random searches performed in 174 ms
2000000 random searches performed in 173 ms

ArrayIndex, mismatches allowed
2000000 random searches performed in 204 ms
2000000 random searches performed in 202 ms
2000000 random searches performed in 203 ms
2000000 random searches performed in 203 ms
2000000 random searches performed in 203 ms


In case somebody wants to test it with non-MasmBasic code, the attached macro allows mov xx, ArrayIndex(pArray, match, lengthof Array):
include \masm32\include\masm32rt.inc ; almost pure Masm32 code
include ArrayIndex.mac ; not for distribution

.data
MyArray dd -2, -1, 0, 1, 11h, 12h, 13h, 14h, 15h, 16h, 22h, 23h, 25h, 51, 53, 37h, 44h, 45h, 46h, 55h, 66h, 77h, 88h

.code
start:
  xor ebx, ebx
  .Repeat
mov edi, MyArray[4*ebx] ; get a value
print "element #"
print str$(ebx), 9, "value "
print str$(edi), "  ", 9, " was found at pos "
print str$(ArrayIndex(offset MyArray, edi, lengthof MyArray)), 13, 10
inc ebx
.Until ebx>=lengthof MyArray
inkey "--- hit any key ---"
  exit

end start

guga

Tks a lot, JJ.

One question. How do i enabled to test the Real8 mode ?  And i saw on the code there is a qword too, how to enable them ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

LiaoMi

 :t

Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz

2000000 random searches performed in 361 ms with repnz scasd for a 543 elements table
2000000 random searches performed in 301 ms with bsearch for a 54321 elements table

ArrayIndex, mismatches not allowed
2000000 random searches performed in 143 ms
2000000 random searches performed in 144 ms
2000000 random searches performed in 143 ms
2000000 random searches performed in 146 ms
2000000 random searches performed in 145 ms

ArrayIndex, mismatches allowed
2000000 random searches performed in 166 ms
2000000 random searches performed in 167 ms
2000000 random searches performed in 172 ms
2000000 random searches performed in 168 ms
2000000 random searches performed in 169 ms
--- hit any key ---

aw27

4 billion tests on a 500,000 elements array!
Now, it works according to Guga's latest specification (for dwords).
I will not spend more time with this, someone will do the real 8, for sure.

***** Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz *****

Number of Tests:4000000, Table Size:500000
Highest Positive Dword:300000, Lowest Negative Dword:-300000
Mismatches not allowed/Sequential Keys
Elapsed time: 177 miliseconds

Number of Tests:4000000, Table Size:500000
Highest Positive Dword:300000, Lowest Negative Dword:-300000
Mismatches allowed/Ramdom Keys, Max. of Random Variation from Table value: 500
Elapsed time: 483 miliseconds

jj2007

Quote from: guga on February 08, 2019, 11:16:54 PMOne question. How do i enabled to test the Real8 mode ?  And i saw on the code there is a qword too, how to enable them ?

The macro is intelligent: if you pass a REAL8 number to search, it knows what to do with it 8)

The QWORD code won't work, it has a little bug, and I am too tired / not interested enough to fix it. One problem is that comisd is a great instruction, but it reacts allergic against certain QWORDs, saying "that's Not A Number!". The solution is obviously to split the comparison in hi dword, then lo dword, but right now I want to drink a glass of good red wine instead ;)

guga

I´ll try on it :) Amazing. many tks :)


Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Hi JJ

Found a small problem on the Real8 function. When the number to search is a bit different  from the one in table (problem with rounding mode, for example) it returns the previous position.

For example i want to search for the number 100, but due to the rounding mode or convertion using  FPU operations, it may generates something that is not 00000000h 04590000h .

Ex:
NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh . This also is the number 100 in Real8

MyTable: 1, 19, 21, 39, 41, 55, 90, 100

It will return position 6, instead of 7.


Also...using "comisd ... ja" it is looking only for numbers that are positive, right ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Hi Guga,
In spite of the ja, it works fine with negative numbers (and it fails miserably with jg ;)).

Re "This also is the number 100 in Real8":

;   NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh
  push 04058FFFFh
  push 0FFFFFFFEh      ; even 0FFFFFFFFh will not produce a 100.0
  movlps xmm0, qword ptr [esp]
  fld qword ptr [esp]
  deb 4, "rounding error...", f:xmm0, ST(0)

Output:
rounding error...
f:xmm0          99.99999999999997
ST(0)           99.99999999999997158

guga

Hmm..indeed.

It seems that there is a bug in RosAsm float to string :shock:. On the debugger it is rounding and most likely on the assembler too. I´ll take a look and test. The same thing seems to happens with negative numbers.

Thanks a lot :t :t :t :t
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

aw27

Quote from: jj2007 on February 09, 2019, 12:27:20 PM
Hi Guga,
In spite of the ja, it works fine with negative numbers (and it fails miserably with jg ;)).

Re "This also is the number 100 in Real8":

;   NumbertoFind: dd 0FFFFFFFEh, 04058FFFFh
  push 04058FFFFh
  push 0FFFFFFFEh      ; even 0FFFFFFFFh will not produce a 100.0
  movlps xmm0, qword ptr [esp]
  fld qword ptr [esp]
  deb 4, "rounding error...", f:xmm0, ST(0)

Output:
rounding error...
f:xmm0          99.99999999999997
ST(0)           99.99999999999997158

Why don't you use movlpd if you are dealing with real8?

jj2007

Real number precision is a can of worms :(

The deb macro tries to use the optimal number of digits depending on the size:

  fldpi
  fst tmp4  ; REAL4
  fst tmp8  ; REAL8
  fstp tmp10
  deb 4, "test", tmp4, tmp8, tmp10


Output:test
tmp4            3.141593
tmp8            3.141592653589793
tmp10           3.141592653589793238


Re movlpd, it's one byte more and no difference speed-wise, otherwise no difference. Mike Popoloski writes in Demystifying SSE Move Instructions:

Quotemovaps, movapd, and movdqa will have the same delay no matter what data you use. Since movaps (and movups) is encoded in binary form with one less byte than the other two, it makes sense to use it for all reg-mem moves, regardless of the data type

See also https://stackoverflow.com/questions/36048502/sse-instruction-movsd-extended-floating-point-scalar-vector-operations-on-x8

guga

QuoteReal number precision is a can of worms :(

Indeed. I´m having to make a hard choice right now to what is the best approach on RosAsm´s debugger. The numbers are rounded to the nearest (As in OllyDbg and Idapro) but, perhaps, it would be better to keep the numbers precise (For the debugger, mainly). The FPU in RosAsm was last updated more then a decade ago and i didn´t updated it ever since. The routines for the assembler, disassembler and debugger are similar to each other, so i´m thinking what would be the best choice. Maybe creating a option to allow the user round the FPU or not (Specially for the disassembler when facing numbers such as: 99.99999999999997. We chosen to round those kind of numbers for the disassembler but, it maybe good to made this as an user option and not an automated mode.

Thinking :icon_rolleyes: :icon_rolleyes:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com