### Author Topic: Fast Compare Real8 with SSE and ColorSpaces  (Read 10914 times)

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Fast Compare Real8 with SSE and ColorSpaces
« on: January 29, 2019, 02:31:06 PM »
Hi Guys

Somebody suceeded to create a faster way to find a Real8 value among a table of 256 ?

The problems is like thiss:

given a table of 256 Real8 values from 0 to 100 (And all in crescent order from 0 to 100 only)

Btw: R\$ stands for Real8 notation ;)

[MyTable:
Data0: R\$  0 ; Index = 0
Data1: R\$ 4.89e-2 ; Index = 1
Data2: R\$ 9.79e-2 ; Index = 2
Data3: R\$ 1.46e-1 ; Index = 3
Data4: R\$ 1.95e-1 ; Index = 4
(...)
Data78: R\$ 9.44
Data79: R\$ 9.67
(...)
Data150: R\$ 33.6 ; Index = 150
Data151: R\$ 34.13 ; Index = 151
(...)
Data254: R\$ 99.18 ; Index = 254
Data255: R\$ 100] ; Index = 255

MyValue = R\$ 78.5212

The goal is to find the value "78.5212" among the 256 Real8 Array (In crescent order) and returns the corresponding index of it.

I made a routine for that, but it is kinda slow, though :redface: :redface: :redface: Not sure how to use SSE to make it faster

Code: [Select]
`    ; ebx points to the end of the array. So points to "Data255"    Fpu_If R@MyValue >= R\$ebx        mov eax 255 | jmp L1> ; Found value on the last array (the 255th one). Set it´s index to eax    Fpu_End_If    mov ecx 254    Do        Fpu_If_And R@MyValue < R\$ebx, R@MyValue >= R\$ebx-8   ;If value was found in between the current and the previous, we have a match            mov eax ecx | jmp L1> ; Found value and set it´s index to eax        Fpu_End_If        sub ebx 8        dec ecx    Loop_Until ecx <= 0L1:`
« Last Edit: March 10, 2019, 04:36:53 AM by guga »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### jj2007

• Member
• Posts: 10260
• Assembler is fun ;-)
##### Re: Fast Compare Real8 with SSE
« Reply #1 on: January 29, 2019, 03:24:21 PM »
Option 0 below is probably the fastest solution (project attached):

table REAL8 1234567890.1, 1234567890.2, 1234567890.3, 1234567890.4, 1234567890.5, 1234567890.6

Init
fld FP8(1234567890.5)   ; we want to see Match #5
sub esp, REAL8
fstp REAL8 ptr [esp]
movlps xmm1, REAL8 ptr [esp]
mov esi, offset table
xor ecx, ecx
if 1  ; 1=use Fcmp, 0=use pcmpeqb & friends
.Repeat
movlps xmm0, [esi+REAL8*ecx]
inc ecx
Fcmp xmm0, xmm1, high   ; high = be reasonably strict; try top or medium
.Until Zero? || ecx>5
else
.Repeat
movlps xmm0, [esi+REAL8*ecx]
pcmpeqb xmm0, xmm1
pmovmskb eax, xmm0
inc ecx
.Until ecx>5 || ax==0FFFFh
endif
.if Zero?
Inkey Str\$("Match at index %i", ecx)
.else
Inkey "No match, sorry"
.endif
EndOfCode

Option 1, Fcmp, is a bit slower, but the problem here is that the fast solution requires an exact match. Real world examples rarely have such exact matches. What do you need it for?

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #2 on: January 29, 2019, 04:26:57 PM »
Tks a lot, Jochen :t :t :t

I´ll give it a try It´s for a two functions i´m making to convert from RGB to CieLCH (and vice-versa). I´m also studying how you managed to make a faster sin and cosine function. So far, the functions i built are fast for image processing, but still not fast enough for video. With the help of those optimizations i hope the functions can be improved. If i succeed to make it work as expected i´ll build a library and post it here for you :)

I was a bit surprised about the power of the CieLCH conversions and it´s precision and started to give a try on then :) I was amazed whern found out after doing the maths envolved with those colorspaces that, in fact we can get rid of tons of useless colors combinations that are, in fact, indistinguishable from each other. I already succeeded to reduce the luminance values to fit´s to the range of 0 to 255 and it seems that for chroma and hue it has some restrictions as well. It seems that chrom and hue are related to each other on every luminance value from a table. So it would be much easier to perform the conversions and, at the same time, make the general aspect of a image (or video) be more consistent to what the CieLCH colorspace is actually doing.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #3 on: January 29, 2019, 11:05:18 PM »
Hi JJ

I tried the pcmpeqb xmm0, xmm1 but it´s not working. It´s oly exiting the loop when both values are exactly the  same.

The goal is to exit the loop when the value in MyTable is bigger or equal to the inputed value "1234567890.5"

For instance, if the difference between them are more then 8 digits, the routine dn´t exit the loop and don´t find the number.

I also tried with pcmpgtd/movlpd but no sucess. I did this:

Code: [Select]
`   mov ebx MYTable    movlpd xmm1 X@ValuetoFind    xor ecx ecx    Do        movlpd XMM0 X\$ebx+ecx*8        pcmpgtd XMM0 XMM1        pmovmskb eax XMM0        If eax <> 0            dec ecx | jmp L1>        End_If        inc ecx    Loop_Until ecx > 255`
Ex:
[MyTable: R\$ 0 ; index1
2 ; index2
4 ; index3
6 ; index 4
8] ; index 5

MyNumber = 3

Then it will exit the loop when the value was found  in Between 2 and 4. In, short, when it reaches "4" it will exit the loop, since 4 is bigger then 3 (Also needs to work for equal values as well and not only bigger then)

A question, why comparing also ax with -1 ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### hutch--

• Member
• Posts: 7212
• Mnemonic Driven API Grinder
##### Re: Fast Compare Real8 with SSE
« Reply #4 on: January 30, 2019, 01:44:10 AM »
Guga,

See if "comisd" is more suited to the comparison you need.
hutch at movsd dot com
http://www.masm32.com

#### daydreamer

• Member
• Posts: 1218
• I also want a stargate
##### Re: Fast Compare Real8 with SSE
« Reply #5 on: January 30, 2019, 03:18:42 AM »
Hutch is right comissd for conditional jump
Organize in Search tree maybe faster?
Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

#### jj2007

• Member
• Posts: 10260
• Assembler is fun ;-)
##### Re: Fast Compare Real8 with SSE
« Reply #6 on: January 30, 2019, 03:37:13 AM »
COMISD works but it is exactly as fast as the pcmpeqb/pmovmskb combi. Besides, the problem remains that equality means for both variants 53 identical bits in the mantissa. If the matches are guaranteed to be exactly equal at REAL8 level, fine, otherwise you need to allow for a delta as in Fcmp.

Below results for a 50 Million elements array of random doubles in the range -10.0 ... +10.0
The first run has an exact match at the end of the array.
For the second run, the search value was increased by a tiny amount.

The issue is tricky and requires a good task definition. My impression is that working with integers would be a wiser strategy.

Code: [Select]
`Intel(R) Core(TM) i5-2450M CPU @ 2.50GHzlast elements of double array:49999999        -1.0984579985961250000000        -6.6993242874741550000001        0.0Now trying to find a match for -6.699324287474150097Finding match at pos 50000000 (-6.699324287474150097) took 825 ms using FcmpFinding match at pos 50000000 (-6.699324287474150097) took 70 ms using pcmpeqbFinding match at pos 50000000 (-6.699324287474150097) took 70 ms using comisdNow trying to find a match for -6.698733009397984439Finding match at pos 20284227 (-6.698732967488462364) took 346 ms using FcmpNo match using pcmpeqb, sorryNo match using comisd, sorry`

#### AW

• Member
• Posts: 2583
• Let's Make ASM Great Again!
##### Re: Fast Compare Real8 with SSE
« Reply #7 on: January 30, 2019, 04:22:17 AM »
I don't think SSE will be advantageous, what we need is search using Binary Search. It will get there in a maximum of 9 iterations.

Code: [Select]
`H:\temp>binsearchSearching for 937.5896481215857 (position 237)Found 937.5896481215857 in position 237 in 7 iterationsH:\temp>binsearchSearching for 477.70622882778406 (position 114)Found 477.70622882778406 in position 114 in 8 iterationsH:\temp>binsearchSearching for 945.06668294320502 (position 246)Found 945.06668294320502 in position 246 in 8 iterationsH:\temp>binsearchSearching for 532.82265694143496 (position 122)Found 532.82265694143496 in position 122 in 8 iterationsH:\temp>binsearchSearching for 995.36118655964844 (position 255)Found 995.36118655964844 in position 255 in 9 iterationsH:\temp>binsearchSearching for 552.75124362926124 (position 131)Found 552.75124362926124 in position 131 in 6 iterationsH:\temp>binsearchSearching for 526.77999206518757 (position 140)Found 526.77999206518757 in position 140 in 8 iterationsH:\temp>binsearchSearching for 98.544267097994933 (position 16)Found 98.544267097994933 in position 16 in 8 iterationsH:\temp>binsearchSearching for 617.23685415204318 (position 149)Found 617.23685415204318 in position 149 in 7 iterationsH:\temp>binsearchSearching for 205.78630939664907 (position 52)Found 205.78630939664907 in position 52 in 8 iterationsH:\temp>binsearchSearching for 672.10913418988616 (position 184)Found 672.10913418988616 in position 184 in 8 iterationsH:\temp>binsearchSearching for 250.09918515579699 (position 60)Found 250.09918515579699 in position 60 in 8 iterationsH:\temp>binsearchSearching for 754.02081362346257 (position 193)Found 754.02081362346257 in position 193 in 7 iterationsH:\temp>binsearchSearching for 277.16910306100652 (position 69)Found 277.16910306100652 in position 69 in 7 iterationsH:\temp>binsearchSearching for 787.62169255653555 (position 202)Found 787.62169255653555 in position 202 in 8 iterationsH:\temp>binsearchSearching for 320.99368266853844 (position 78)Found 320.99368266853844 in position 78 in 8 iterationsH:\temp>binsearchSearching for 846.88863795892212 (position 210)Found 846.88863795892212 in position 210 in 8 iterationsH:\temp>binsearchSearching for 350.0167851802118 (position 87)Found 350.0167851802118 in position 87 in 5 iterations`
The attachment contains source in C and .exe . It is easy to convert to ASM.

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #8 on: January 30, 2019, 04:26:30 AM »
Many Tks, Steve and JJ :)

Steve, i gave a try on comisd and comis but, it seems that they are slower then the regular fcomip accordlying to https://stackoverflow.com/questions/37766131/intel-x86-64-assembly-compare-signed-double-precision-floats.
So i did a variation of the macro i use for Fpu Comparitions, and this worked but, i´m not sure yet about speed yet :icon_rolleyes:

Code: [Select]
`(...)    lea ecx D@TmpRedDis | fld R@pXDis | fmul R\$esi+WS_Matrix.Inverted.Red_M1Dis | fld R@pYDis | fmul R\$esi+WS_Matrix.Inverted.Green_M2Dis | faddp ST1 ST0 | fld R@pZDis | fmul R\$esi+WS_Matrix.Inverted.Blue_M3Dis | faddp ST1 ST0 | fstp R\$ecx    lea ecx D@TmpGreenDis | fld R@pXDis | fmul R\$esi+WS_Matrix.Inverted.Red_M4Dis | fld R@pYDis | fmul R\$esi+WS_Matrix.Inverted.Green_M5Dis | faddp ST1 ST0 | fld R@pZDis | fmul R\$esi+WS_Matrix.Inverted.Blue_M6Dis | faddp ST1 ST0 | fstp R\$ecx    lea ecx D@TmpBlueDis | fld R@pXDis | fmul R\$esi+WS_Matrix.Inverted.Red_M7Dis | fld R@pYDis | fmul R\$esi+WS_Matrix.Inverted.Green_M8Dis | faddp ST1 ST0 | fld R@pZDis | fmul R\$esi+WS_Matrix.Inverted.Blue_M9Dis | faddp ST1 ST0 | fstp R\$ecx    ; After computing the values to be found, let´s scan on our look up table    lea ebx D\$esi+WS_Matrix.KFactorMapDis ; Points to the look up table    mov edi D@Red ; Output result in "Red" Variable    fld R@TmpRedDis | fmul R\$Float100 | fstp R@TmpRedDis    fld R@TmpRedDis    xor ecx ecx    Do        fld R\$ebx+ecx*8 | fcomip ST0 ST1 | jna L0>            ffree ST0            dec ecx | jmp L1>        L0:        inc ecx    Loop_Until ecx > 255    dec ecxL1:    mov D\$edi ecx`

JJ, i can´t work with integer, unfortunatelly. Not on this part of the code. What returns in "R@TmpRedDis" is, in fact, the result of a computation of the red pixel, so it is a fraction from 0 to 100. Each value on the table i´m using contains the result of "TmpRedDis" whose value is computed on a power basics. So, TempRed = XXX*YYY +1/(z^C) etc etc. The good thing is that the table contains 256 Real8 which values are ordered from 0 to 100 (in real8)...so, 0.04456456, 0.689798, 1.23258., 1.989, ....100 Which makes the searching  faster to scan, i presume.

I´m trying to replace the function to findd the proper value of TempRed with a simple scan to it´s look up table rather then using the regular function to compute it  directly.

The direct function uses a power of 2.4 to retrieve the correct (final) value of RedPixel (originated from TmpRedDis).

The "direct" function is like this:

Code: [Select]
`(...)    lea ecx D@TmpRedDis | fld R@pXDis | fmul R\$esi+WS_Matrix.Inverted.Red_M1Dis | fld R@pYDis | fmul R\$esi+WS_Matrix.Inverted.Green_M2Dis | faddp ST1 ST0 | fld R@pZDis | fmul R\$esi+WS_Matrix.Inverted.Blue_M3Dis | faddp ST1 ST0 | fstp R\$ecx    lea ecx D@TmpGreenDis | fld R@pXDis | fmul R\$esi+WS_Matrix.Inverted.Red_M4Dis | fld R@pYDis | fmul R\$esi+WS_Matrix.Inverted.Green_M5Dis | faddp ST1 ST0 | fld R@pZDis | fmul R\$esi+WS_Matrix.Inverted.Blue_M6Dis | faddp ST1 ST0 | fstp R\$ecx    lea ecx D@TmpBlueDis | fld R@pXDis | fmul R\$esi+WS_Matrix.Inverted.Red_M7Dis | fld R@pYDis | fmul R\$esi+WS_Matrix.Inverted.Green_M8Dis | faddp ST1 ST0 | fld R@pZDis | fmul R\$esi+WS_Matrix.Inverted.Blue_M9Dis | faddp ST1 ST0 | fstp R\$ecx    ; After computing the values to be found, let´s do the computations to find red, Green and Blue using "Power of" maths    lea ecx D@TmpRedDis    .Fpu_If R\$ecx > R\$esi+WS_Matrix.GammaDecode.TresholdDis        ;Color = ((Color+0.055)/1.055)^2.4        fld R\$esi+WS_Matrix.GammaDecode.GammaDis | fld R\$ecx | fyl2x | fld1 | fld ST1 | fprem | f2xm1 | faddp ST1 ST0 | fscale | fxch | fstp ST0        fmul R\$esi+WS_Matrix.GammaDecode.OffsetPlusDis | fsub R\$esi+WS_Matrix.GammaDecode.OffsetDis    .Fpu_Else         ; Here is faster, but happens only when "index" (Pixel color) is (in general) smaller then 12.12354(or something..this value is not fixed, although it is small). So, the odds of a pixel falls on this jmp routine are 12/256.        fld R\$ecx | fmul R\$esi+WS_Matrix.GammaDecode.SlopeDis    .Fpu_End_If    mov ecx D@Red | fmul R\$Float255 | fistp F\$ecx`
See ? The function makes usage of logaritm FPU, scaling etc etc...All of this seems to be slower then a loop scan on the LookupTable, though.

Didn´t have time to test for speed yet, but it seems that a direct computation is slower then a simple scan on the  256 Real8 values.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #9 on: January 30, 2019, 04:27:49 AM »
Hi Aw, a binary search ? Hmm...indeed..I´l take a look. Many tks :)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### daydreamer

• Member
• Posts: 1218
• I also want a stargate
##### Re: Fast Compare Real8 with SSE
« Reply #10 on: January 30, 2019, 05:41:26 AM »
I am trying something similar, but enhance it with lookup one double and the following entry in the data and calculate mean value of two,or more advanced calculation based also on which number it is closer affect it more than the one that its farther to

Quote from Flashdance
Nick  :  When you give up your dream, you die
*wears a flameproof asbestos suit*
Gone serverside programming p:  :D
I love assembly,because its legal to write
princess:lea eax,luke
:)

#### jj2007

• Member
• Posts: 10260
• Assembler is fun ;-)
##### Re: Fast Compare Real8 with SSE
« Reply #11 on: January 30, 2019, 06:44:28 AM »
OK, now I see it's an entirely different story. It could be done with a simple cmp eax, [esi+8*ecx] - assuming that the low DWORD is unique in that table of 256.

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #12 on: January 30, 2019, 12:20:26 PM »
Hi Jochen.

I´m not sure if the Low Dword values on the table are unique. I´ll take a look at them :t :t :t. But...even if they are unique, how a look up table containing only the low dwords  will work better ?

The main routine to calculate those values used in the KFactorMap are computed way before the main converter (RGBtoCieLCH and CieLCHtoRGB) starts. The values are calculated from the gamma and offset  related to each monitor/camera/colorspace etc (Adobe1998, Bruce, SMTP etc etc)

Code: [Select]
`    ; and create our KFactor Map for each color. Put the resultant values on the main Structure to form the table    xor ecx ecx    Do        call FindKFactor ecx, D@pOutStruct        inc ecx    Loop_Until ecx > 255`
And the main routine is at:
Code: [Select]
`; RGBtoCieLCH_ExProc FindKFactor:    Arguments @Color, @pMatrix    Structure @TempStorage 64, @pKfactorColorDis 0, @pKFactorWhiteRefXDis 8, @pKFactorWhiteRefYDis 16, @pKFactorWhiteRefZDis 24, @TmpGreenDis 32, @TmpBlueDis 40, @pAFactorDis 48, @pBFactorDis 56    Uses esi, ebx, ecx, edx, edi    finit    lea ecx D@pKfactorColorDis | fild F@Color | fmul R\$FloatOne_255 | fstp R\$ecx    mov esi D@pMatrix    ; Points to the Start of KfactorMap array    mov ebx esi | add ebx WS_Matrix.KFactorMapDis ; Points to the start of the KFactor Map array in WS_Matrix Structure    xor edx edx    mov eax 8;Size_Of_KFactorMap    mul D@Color    add ebx eax    ; newColorRed = ((R@Red+0.055)/1.055)^2.4 ; where2.4 = gamma, 0.055 = Offset, 1.055 = Offset+1. Better precalculating them as i did.    lea ecx D@pKfactorColorDis    .Fpu_If R\$ecx > R\$esi+WS_Matrix.GammaEncode.TresholdDis        fld R\$esi+WS_Matrix.GammaEncode.GammaDis | fld R\$esi+WS_Matrix.GammaEncode.OffsetDis | fadd R\$ecx | fmul R\$esi+WS_Matrix.GammaEncode.OffsetPlusDis | fyl2x | fld1 | fld ST1 | fprem | f2xm1 | faddp ST1 ST0 | fscale | fxch | fstp ST0    .Fpu_Else        fld R\$ecx | fdiv R\$esi+WS_Matrix.GammaEncode.SlopeDis    .Fpu_End_If    fmul R\$Float100 | fstp R\$ebx;+KFactorMap.ColorDis ; Original i wrote it as KFactorRed, KfactorGreen, KfactorBlue, but, in fact it is only 1 color/pixel to check. So... 256 in totalEndP`
These functions are giving me a tremendous headache. I just found out that the threshold must be analysed (On the main convertiopn function CieLCHtoRGB and not the above one;) ) . When i convert colors such as Red = 12, Green = 14, Blue = 0, the resultant vaules after tyhe multiplication of the matrices are below the "WS_Matrix.GammaDecode.Treshold". So i´ll need to try to find a integer value and put that liek another members of the structure i´m using to create the table. Once i found if the threshold can be converted to a integer, it would be eaiser to make the check doing the convertion back after the matrix multiplication.

Something like:

Code: [Select]
`mov eax D\$esi+ThresholdIntegerfld R@TmpRedDis | fmul R\$esi+WS_Matrix.GammaDecode.SlopeDis | fmul R\$Float255 | fistp D@TempColorCheckIf D@TempColorCheck < eax      return D@TempColorCheck ; the color of this pixel was found under the treshold.Else     ; perform check on the loop up table (Binary search, or whatever other method)EndIf`

I hope i can find a way to create a integer related to this threshold. The only problem i see is that we will need extra coding (mul by slope and mul by 255) to pixels that are found in less then 13% of the cases.

A true pain make those functions works correctly and to get things worst, all the papers i read so far are utterly complicated and tends to put different equations to create the very same thing, rather then simply trying to merge those equations together and then try to simplify all of this.

At least i´ve got the function works (well..kind of :icon_mrgreen: :icon_mrgreen:) and the only thing that needs to be done is optimize all of this.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

#### AW

• Member
• Posts: 2583
• Let's Make ASM Great Again!
##### Re: Fast Compare Real8 with SSE
« Reply #13 on: January 30, 2019, 08:31:57 PM »
All right, finally a MASM solution in this thread (which is derived from my initial C solution  :lol:) adjusting for the requirement that the search value doesn't need to be exact.

So, I adjusted the original C algo like this:

Code: [Select]
`int binarySearch(double sortedArray[], int first, int last, double key, int *iterations){ while (first <= last) { int mid = first + (last-first) / 2; (*iterations)++; if (key == sortedArray[mid]) return mid; else if (key < sortedArray[mid]) last = mid - 1; else { if (key < sortedArray[mid + 1]) return mid; else first = mid + 1; } } return -1;}`
and can be translated into an ASM module like this, after deoptimizing as needed the compiler ASM output:
Code: [Select]
`.model flat, Coption casemap :none.codebinSearch proc uses esi edi ebx sortedArray:ptr, first:dword, last:dword, _key:real8, iteration:ptr fld QWORD PTR _key mov edi, last mov esi, first mov ebx, iteration@startLoop: inc dword ptr[ebx] ; if (key == sortedArray[mid]) fld ST(0) ;  duplicate stack mov eax, edi sub eax, esi cdq sub eax, edx mov ecx, eax sar ecx, 1 add ecx, esi mov edx, sortedArray fld QWORD PTR [edx+ecx*8] fucompp fnstsw ax test ah, 68 ; 00000044H jnp @exit fcom QWORD PTR [edx+ecx*8] fnstsw ax test ah, 5 jp SHORT @checkElse lea edi, DWORD PTR [ecx-1] jmp SHORT @endLoop@checkElse: fcom QWORD PTR [edx+ecx*8+8] fnstsw ax test ah, 5 jnp SHORT @exit lea esi, DWORD PTR [ecx+1]@endLoop: cmp esi, edi jle SHORT @startLoop@exit: fstp ST(0) mov eax, ecx retbinSearch endpend`
So, lets test 100 times:

Code: [Select]
`Searching for 393.98684652241582 (position 104)Found 392.98684652241582 in position 104 in 8 iterationsSearching for 1.488296151615955 (position 0)Found 0.48829615161595508 in position 0 in 8 iterationsSearching for 343.86123233741267 (position 91)Found 340.86123233741267 in position 91 in 6 iterationsSearching for 866.07766960661638 (position 218)Found 865.07766960661638 in position 219 in 6 iterationsSearching for 768.01458784752947 (position 198)Found 766.01458784752947 in position 198 in 8 iterationsSearching for 457.86919766838588 (position 125)Found 457.86919766838588 in position 125 in 7 iterationsSearching for 388.74477370525221 (position 102)Found 388.74477370525221 in position 102 in 8 iterationsSearching for 180.82216864528337 (position 51)Found 180.82216864528337 in position 51 in 6 iterationsSearching for 427.22861415448472 (position 114)Found 427.22861415448472 in position 114 in 8 iterationsSearching for 517.38947721793272 (position 137)Found 514.38947721793272 in position 137 in 7 iterationsSearching for 480.00430310983609 (position 127)Found 477.00430310983609 in position 129 in 7 iterationsSearching for 326.21442304757835 (position 82)Found 322.21442304757835 in position 83 in 6 iterationsSearching for 645.95684682760088 (position 162)Found 641.95684682760088 in position 164 in 8 iterationsSearching for 950.56920072023684 (position 239)Found 947.56920072023684 in position 239 in 4 iterationsSearching for 825.06790368358406 (position 212)Found 825.06790368358406 in position 212 in 8 iterationsSearching for 801.96746726889853 (position 206)Found 797.96746726889853 in position 208 in 8 iterationsSearching for 598.49137852107299 (position 154)Found 597.49137852107299 in position 155 in 6 iterationsSearching for 182.50276192510759 (position 49)Found 178.50276192510759 in position 52 in 8 iterationsSearching for 910.81780449842836 (position 229)Found 909.81780449842836 in position 229 in 7 iterationsSearching for 178.50276192510759 (position 49)Found 178.50276192510759 in position 49 in 7 iterationsSearching for 193.48023926511428 (position 54)Found 192.48023926511428 in position 54 in 8 iterationsSearching for 10.19525742362743 (position 2)Found 6.1952574236274298 in position 3 in 6 iterationsSearching for 27.130954924161504 (position 7)Found 27.130954924161504 in position 7 in 5 iterationsSearching for 441.32816553239542 (position 120)Found 441.32816553239542 in position 120 in 8 iterationsSearching for 394.19336527603991 (position 103)Found 392.19336527603991 in position 104 in 8 iterationsSearching for 791.80480361339153 (position 202)Found 787.80480361339153 in position 203 in 6 iterationsSearching for 911.36002685628841 (position 228)Found 909.36002685628841 in position 229 in 7 iterationsSearching for 581.5159764397107 (position 151)Found 579.5159764397107 in position 151 in 5 iterationsSearching for 660.84780419324318 (position 173)Found 660.84780419324318 in position 173 in 7 iterationsSearching for 145.35254982146674 (position 38)Found 144.35254982146674 in position 38 in 8 iterationsSearching for 969.23734244819479 (position 248)Found 969.23734244819479 in position 248 in 8 iterationsSearching for 961.31171605578777 (position 242)Found 958.31171605578777 in position 243 in 6 iterationsSearching for 642.95684682760088 (position 162)Found 641.95684682760088 in position 162 in 8 iterationsSearching for 392.74477370525221 (position 102)Found 388.74477370525221 in position 103 in 5 iterationsSearching for 93.426343577379683 (position 21)Found 90.426343577379683 in position 22 in 8 iterationsSearching for 909.81780449842836 (position 229)Found 909.81780449842836 in position 229 in 7 iterationsSearching for 421.13409833063753 (position 111)Found 418.13409833063753 in position 111 in 4 iterationsSearching for 54.386059144871361 (position 13)Found 50.386059144871361 in position 13 in 7 iterationsSearching for 899.67143772698148 (position 225)Found 897.67143772698148 in position 225 in 7 iterationsSearching for 222.16763817255165 (position 61)Found 221.16763817255165 in position 61 in 7 iterationsSearching for 331.21738334299755 (position 84)Found 331.21738334299755 in position 84 in 8 iterationsSearching for 335.21738334299755 (position 84)Found 331.21738334299755 in position 87 in 5 iterationsSearching for 324.61827448347424 (position 83)Found 323.61827448347424 in position 83 in 6 iterationsSearching for 716.20126956999422 (position 188)Found 715.20126956999422 in position 190 in 8 iterationsSearching for 144.33121738334302 (position 37)Found 141.33121738334302 in position 37 in 7 iterationsSearching for 660.14795373393963 (position 168)Found 656.14795373393963 in position 171 in 6 iterationsSearching for 208.93890194402906 (position 57)Found 205.93890194402906 in position 57 in 7 iterationsSearching for 593.89019440290531 (position 153)Found 593.89019440290531 in position 153 in 7 iterationsSearching for 787.87502670369577 (position 200)Found 784.87502670369577 in position 202 in 8 iterationsSearching for 147.24469740897854 (position 39)Found 146.24469740897854 in position 39 in 5 iterationsSearching for 678.17520676290167 (position 181)Found 677.17520676290167 in position 181 in 7 iterationsSearching for 998.74163029877616 (position 254)Found 997.74163029877616 in position 255 in 9 iterationsSearching for 699.79149754325999 (position 184)Found 695.79149754325999 in position 184 in 8 iterationsSearching for 150.24469740897854 (position 39)Found 146.24469740897854 in position 40 in 8 iterationsSearching for 714.77401043733028 (position 187)Found 714.77401043733028 in position 187 in 6 iterationsSearching for 516.38947721793272 (position 137)Found 514.38947721793272 in position 137 in 7 iterationsSearching for 497.91866817224644 (position 132)Found 494.91866817224644 in position 132 in 8 iterationsSearching for 663.00750755333104 (position 175)Found 662.00750755333104 in position 175 in 4 iterationsSearching for 578.77642139957891 (position 149)Found 577.77642139957891 in position 149 in 7 iterationsSearching for 793.84954374828328 (position 203)Found 789.84954374828328 in position 204 in 8 iterationsSearching for 323.71190527054659 (position 80)Found 319.71190527054659 in position 83 in 6 iterationsSearching for 979.27082125308993 (position 251)Found 978.27082125308993 in position 251 in 6 iterationsSearching for 360.4256721701712 (position 95)Found 356.4256721701712 in position 95 in 3 iterationsSearching for 791.95532090212714 (position 204)Found 791.95532090212714 in position 204 in 8 iterationsSearching for 325.61827448347424 (position 83)Found 323.61827448347424 in position 83 in 6 iterationsSearching for 997.99496444593649 (position 253)Found 994.99496444593649 in position 254 in 8 iterationsSearching for 540.79717398602259 (position 139)Found 537.79717398602259 in position 141 in 7 iterationsSearching for 501.13733329264198 (position 133)Found 500.13733329264198 in position 134 in 8 iterationsSearching for 340.86330759605698 (position 88)Found 336.86330759605698 in position 91 in 6 iterationsSearching for 839.77782525101475 (position 214)Found 839.77782525101475 in position 214 in 8 iterationsSearching for 349.44611957152011 (position 94)Found 346.44611957152011 in position 94 in 8 iterationsSearching for 383.97677541428874 (position 100)Found 382.97677541428874 in position 100 in 8 iterationsSearching for 230.68230231635485 (position 62)Found 229.68230231635485 in position 63 in 2 iterationsSearching for 362.13263344218268 (position 97)Found 362.13263344218268 in position 97 in 7 iterationsSearching for 429.22861415448472 (position 114)Found 427.22861415448472 in position 115 in 6 iterationsSearching for 791.84954374828328 (position 203)Found 789.84954374828328 in position 203 in 6 iterationsSearching for 443.32816553239542 (position 120)Found 441.32816553239542 in position 121 in 7 iterationsSearching for 661.09402752769552 (position 169)Found 657.09402752769552 in position 173 in 7 iterationsSearching for 182.71639149143957 (position 50)Found 178.71639149143957 in position 52 in 8 iterationsSearching for 563.93731498153636 (position 146)Found 561.93731498153636 in position 146 in 8 iterationsSearching for 540.29969176305428 (position 141)Found 540.29969176305428 in position 141 in 7 iterationsSearching for 420.13409833063753 (position 110)Found 418.13409833063753 in position 111 in 4 iterationsSearching for 366.13263344218268 (position 97)Found 362.13263344218268 in position 97 in 7 iterationsSearching for 483.0490432447279 (position 129)Found 479.0490432447279 in position 130 in 8 iterationsSearching for 587.92486342967004 (position 152)Found 585.92486342967004 in position 152 in 8 iterationsSearching for 649.0768456068605 (position 165)Found 646.0768456068605 in position 165 in 7 iterationsSearching for 568.59953611865603 (position 147)Found 565.59953611865603 in position 147 in 6 iterationsSearching for 842.77782525101475 (position 214)Found 839.77782525101475 in position 214 in 8 iterationsSearching for 788.99710074159975 (position 201)Found 784.99710074159975 in position 202 in 8 iterationsSearching for 970.03289895321507 (position 244)Found 966.03289895321507 in position 249 in 7 iterationsSearching for 669.89046906949068 (position 177)Found 666.89046906949068 in position 178 in 8 iterationsSearching for 764.82436597796561 (position 197)Found 764.82436597796561 in position 197 in 7 iterationsSearching for 910.94906460768459 (position 227)Found 906.94906460768459 in position 229 in 7 iterationsSearching for 216.10364085818048 (position 58)Found 212.10364085818048 in position 60 in 8 iterationsSearching for 446.80727561265911 (position 122)Found 444.80727561265911 in position 123 in 6 iterationsSearching for 964.17334513382366 (position 243)Found 960.17334513382366 in position 243 in 6 iterationsSearching for 717.20126956999422 (position 188)Found 715.20126956999422 in position 190 in 8 iterationsSearching for 11.169774468214973 (position 4)Found 11.169774468214973 in position 4 in 8 iterationsSearching for 995.99496444593649 (position 253)Found 994.99496444593649 in position 253 in 7 iterationsSearching for 976.47230445265052 (position 250)Found 972.47230445265052 in position 250 in 8 iterations`
I don't believe that a linear search outperforms a binary search even with only 256 elements, anyway that is a lot of fun too.

#### guga

• Member
• Posts: 1112
• Assembly is a state of art.
##### Re: Fast Compare Real8 with SSE
« Reply #14 on: January 30, 2019, 10:11:36 PM »
Many Tks AW  :t :t :t. I´ll take a look :)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com