News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Timings for fast median algo

Started by jj2007, July 24, 2020, 12:30:27 AM

Previous topic - Next topic

TimoVJL

159     bytes for MbMedianP
AMD Ryzen 5 3400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0191, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.67, precision loss is -0.0145%

1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
100 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.8, precision loss is 0.0205%

10000001 numbers generated
91 ms to find the median of 10 Mio items, which is 26505.42
86 ms to find the median of 10 Mio items, which is 26505.42
86 ms to find the median of 10 Mio items, which is 26505.42
1051 ms for sorting, the real median is 26510.57
Speed gain is a factor 12.0, precision loss is -0.0194%

100000010 numbers generated
933 ms to find the median of 100 Mio items, which is 26505.42
929 ms to find the median of 100 Mio items, which is 26505.42
931 ms to find the median of 100 Mio items, which is 26505.42
9461 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.2, precision loss is -0.0190%
hit any key
May the source be with you

jj2007

Quote from: guga on July 24, 2020, 01:12:17 PMIt´s working for integers, real4 and negative too ?

Yes, but I still have to adapt it for other formats. Right now it handles only REAL4, negative numbers included.

It's MbMedianP in the source.

Siekmanski

159     bytes for MbMedianP
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.126, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.10, precision loss is -0.0145%

1000000 numbers generated
4 ms to find the median of 1 Mio items, which is 26572.96
4 ms to find the median of 1 Mio items, which is 26572.96
4 ms to find the median of 1 Mio items, which is 26572.96
105 ms for sorting, the real median is 26567.51
Speed gain is a factor 23.5, precision loss is 0.0205%

10000001 numbers generated
41 ms to find the median of 10 Mio items, which is 26505.42
41 ms to find the median of 10 Mio items, which is 26505.42
41 ms to find the median of 10 Mio items, which is 26505.42
1160 ms for sorting, the real median is 26510.57
Speed gain is a factor 28.1, precision loss is -0.0194%

100000010 numbers generated
421 ms to find the median of 100 Mio items, which is 26505.42
420 ms to find the median of 100 Mio items, which is 26505.42
416 ms to find the median of 100 Mio items, which is 26505.42
10341 ms for sorting, the real median is 26510.47
Speed gain is a factor 24.6, precision loss is -0.0190%
hit any key
Creative coders use backward thinking techniques as a strategy.

guga

For what i saw we need to previously know the max and minimum values, right ?

Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).

The algo is calculating the middle of the table and getting the value of the last one in the table from the left, right ? I notice a small mistake when getting positive integers numbers.

At "FindSecondMiddleNumberLP" it is needed to change this:


         mov edx, eax ; Save the first middle number to calculate the average of 2 middle numbers.
FindSecondMiddleNumberLP: ; pretty rare case, mostly with small data sets
inc eax
cmp dword ptr[edi+4*eax], 1 ; ecx must be 1
jb FindSecondMiddleNumberLP
add eax, edx ; Second middle number found, add it to the first middle number.
shr eax, 1 ; And get the average.


To something like this


         mov edx, eax ; Save the first middle number to calculate the average of 2 middle numbers.
inc eax

FindSecondMiddleNumberLP: ; pretty rare case, mostly with small data sets
cmp dword ptr[edi+4*eax], 0 ; Search if the next address (so, the 1st one in the right table) is not empty
jne FoundFirstMiddleNumberRight ; If the address is already filled, exit the loop
                inc eax                                     ; search next address and jump over when the address at this position in the table is filled
        jmp FindSecondMiddleNumberLP ; <---- Sorry, forgot to port this when i was editing this post
FoundFirstMiddleNumberRight:

add eax, edx ; Second middle number found, add it to the first middle number.
shr eax, 1 ; And get the average.


I believe it is the proper masm version. In RosAsm, it is:


        mov edx eax     ; Save the first middle number to calculate the average of 2 middle numbers
        ; here we are always at the end of the left middle. Ex if size = 100, we are at pos 49th
        ; FindSecondMiddleNumberLP.
        inc eax
        while D$edi+eax*4 = 0
            inc eax
        End_while
        add eax edx
        shr eax 1           ; And get the average.


You can test it on these values (Positive integers):

[DataSelect9: D$ 136, 151, 216, 40, 13, 215, 24, 104,
                 230, 154, 227, 219, 52, 217, 38, 131,
                 228, 95, 178, 51, 70, 14, 9, 34,
                 80, 195, 59, 27, 100, 118, 78, 137,
                 65, 104, 7, 32, 23, 58, 202, 154,
                 238, 12, 11, 165, 223, 197, 98, 26,
                 231, 92, 29, 131, 25, 99, 97, 77,
                 155, 120, 154, 182, 177, 211, 103, 238,
                 209, 86, 205, 213, 180, 126, 55, 95,
                 186, 232, 196, 121, 237, 187, 26, 120,
                 13, 56, 186, 167, 14, 136, 90, 196,
                 120, 23, 190, 79, 164, 41, 239, 39,
                 18, 109, 35, 169] ; Size = 100, Median = 119

The reason is that when you see the data in the proper order, you will see something like this:

Left Middle Table

0 ___________________________49 (Position starting from zero)
7   9   11   12   13   13   (...)   118 <--- marinus code found this value when reached FindSecondMiddleNumberLP


Right Middle Table
50 ___________________________100 (Position starting from 50)
120   120   120   121   (...)   238   238   239
________
      |
and we need to find this

Since we have a gap of 2 (120-118), it means that the next address on the right that contains a value is the one 2 dword ahead. Since we already incremented this value at eax, we must 1st see if that address is not empty. If it is empty, we perform the loop untill the next address that is full and increment on each loop.


Can you please, test to see if this is correct. On the values i tested above, it found the proper median. (On positive integers it seems to work. Not sure how it will behave in negative or floats)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

-> Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).

My routine is not the fault, must be something else.  :biggrin:

compare this part with with your conversion:

FindSecondMiddleNumberLP:
    inc     eax   
    mov     ecx,[edi+eax*4]
    test    ecx,ecx                 ; If zero, its not a number.
    jz      FindSecondMiddleNumberLP
Creative coders use backward thinking techniques as a strategy.

daydreamer

159     bytes for MbMedianP
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0437, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.80, precision loss is -0.0145%

1000000 numbers generated
5 ms to find the median of 1 Mio items, which is 26572.96
5 ms to find the median of 1 Mio items, which is 26572.96
5 ms to find the median of 1 Mio items, which is 26572.96
127 ms for sorting, the real median is 26567.51
Speed gain is a factor 22.9, precision loss is 0.0205%

10000001 numbers generated
56 ms to find the median of 10 Mio items, which is 26505.42
56 ms to find the median of 10 Mio items, which is 26505.42
56 ms to find the median of 10 Mio items, which is 26505.42
1357 ms for sorting, the real median is 26510.57
Speed gain is a factor 24.0, precision loss is -0.0194%

100000010 numbers generated
560 ms to find the median of 100 Mio items, which is 26505.42
565 ms to find the median of 100 Mio items, which is 26505.42
552 ms to find the median of 100 Mio items, which is 26505.42
12239 ms for sorting, the real median is 26510.47
Speed gain is a factor 21.9, precision loss is -0.0190%
hit any key
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

FORTRANS

Hi,

   Two results, the second may have run into trouble?  (Memory?)

159     bytes for MbMedianP
Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 4.37, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.89, precision loss is -0.0145%

1000000 numbers generated
8 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
7 ms to find the median of 1 Mio items, which is 26572.96
214 ms for sorting, the real median is 26567.51
Speed gain is a factor 26.7, precision loss is 0.0205%

10000001 numbers generated
83 ms to find the median of 10 Mio items, which is 26505.42
82 ms to find the median of 10 Mio items, which is 26505.42
82 ms to find the median of 10 Mio items, which is 26505.42
2536 ms for sorting, the real median is 26510.57
Speed gain is a factor 30.6, precision loss is -0.0194%

100000010 numbers generated
839 ms to find the median of 100 Mio items, which is 26505.42
828 ms to find the median of 100 Mio items, which is 26505.42
842 ms to find the median of 100 Mio items, which is 26505.42
21065 ms for sorting, the real median is 26510.47
Speed gain is a factor 25.2, precision loss is -0.0190%
hit any key

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Stephan R. Novosad>ncd f:test
NCD-Norton Change Directory, Advanced Edition, (C) Copr 1987, Peter Norton

Changing to F:\TEMP\TEST

F:\TEMP\TEST>arraymed
159     bytes for MbMedianP
Intel(R) Pentium(R) M processor 1.70GHz (MMX, SSE, SSE2)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.145, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.14, precision loss is -0.0145%

1000000 numbers generated
22 ms to find the median of 1 Mio items, which is 26572.96
22 ms to find the median of 1 Mio items, which is 26572.96
22 ms to find the median of 1 Mio items, which is 26572.96
236 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.6, precision loss is 0.0205%

10000001 numbers generated
225 ms to find the median of 10 Mio items, which is 26505.42
223 ms to find the median of 10 Mio items, which is 26505.42
222 ms to find the median of 10 Mio items, which is 26505.42
2545 ms for sorting, the real median is 26510.57
Speed gain is a factor 11.4, precision loss is -0.0194%

100000010 numbers generated
21138 ms to find the median of 100 Mio items, which is 26505.42
2266 ms to find the median of 100 Mio items, which is 26505.42
2240 ms to find the median of 100 Mio items, which is 26505.42
23114 ms for sorting, the real median is 26510.47
Speed gain is a factor 2.70, precision loss is -0.0190%
hit any key


Regards,

Steve

jj2007

Steve,

You mean this one? Very slow but otherwise it looks ok...

100000010 numbers generated
21138 ms to find the median of 100 Mio items, which is 26505.42
2266 ms to find the median of 100 Mio items, which is 26505.42

FORTRANS

Hi Jochen,

   Yes, and the system became very sluggish and
somewhat unresponsive.  It is a fairly old laptop
though.

Regards,

Steve

jj2007

That's odd, because after the string "numbers generated" there is no HeapAlloc or similar that could slow down the system. The algo itself does not allocate anything. Is this behaviour reproducible? If it's a single core CPU, it could have been busy with other tasks...

FORTRANS

Hi,

Quote from: jj2007 on July 25, 2020, 01:05:14 AM
That's odd, because after the string "numbers generated" there is no HeapAlloc or similar that could slow down the system. The algo itself does not allocate anything. Is this behaviour reproducible?

   Sort of reproducible.  Two of around six runs below.
Longest was ~30xxx ms, the shortest ~67xx ms.

100000010 numbers generated
16467 ms to find the median of 100 Mio items, which is 26505.42
2245 ms to find the median of 100 Mio items, which is 26505.42
2233 ms to find the median of 100 Mio items, which is 26505.42

100000010 numbers generated
20118 ms to find the median of 100 Mio items, which is 26505.42
2235 ms to find the median of 100 Mio items, which is 26505.42
2237 ms to find the median of 100 Mio items, which is 26505.42


QuoteIf it's a single core CPU, it could have been busy with other tasks...

   Yes a single core CPU.  Sluggishness seems to
persist until rebooted.  Somewhere between weird and
strange.

Regards,

Steve

jj2007

hmmm... the 100 Mio items mean 400MB on the heap. But that's not the algo, it's just the creation of the array. Weird indeed.

guga

Quote from: Siekmanski on July 24, 2020, 08:52:25 PM
-> Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).

My routine is not the fault, must be something else.  :biggrin:

compare this part with with your conversion:

FindSecondMiddleNumberLP:
    inc     eax   
    mov     ecx,[edi+eax*4]
    test    ecx,ecx                 ; If zero, its not a number.
    jz      FindSecondMiddleNumberLP


Indeed. I mistakes the masm routines. Now it works like a gem :greensml: :greensml: :greensml:
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

I stumbled over a set of data for which Marinus' (modified, using DWORDs) algo fails with an error of 3%: the algo returns 37386, but it should be 38532 :cool:
TestData dd 1332, 1575, 3891, 4226, 5629, 6868, 7295, 10283, 10375, 11866
dd 12064, 12231, 12895, 13365, 14384, 14490, 14501, 15995, 17093, 17275
dd 17833, 18070, 19096, 19988, 20168, 20765, 20861, 21012, 21344, 22404
dd 23377, 23947, 24033, 24401, 24709, 25594, 26025, 26058, 26928, 27084
dd 27571, 28237, 28792, 29066, 30189, 30724, 32569, 32765, 33198, 34558
dd 36270, 36987, 37167, 37288, 37573, 39491, 78260, 193983, 312869, 406851
dd 474584, 878091, 933154, 1535982, 1696834, 1826755, 1894374, 1935375, 1979114, 1980304
dd 1995476, 2163480, 2186807, 2364281, 2775589, 3086356, 3163282, 3296671, 3324133, 3368259
dd 3445517, 3506669, 3672249, 3797020, 4185277, 4691675, 5316867, 5403458, 6181758, 6361385
dd 6439150, 6544605, 6547329, 6819265, 6928806, 6963353, 7069587, 7384388, 7574183, 7697938
dd 7856932, 8106433, 8645157, 8771719, 8886262, 9014975, 9177706, 9481844, 9485408, 9488645

Siekmanski

My routine is not to blame, must be something else.  :cool:
Creative coders use backward thinking techniques as a strategy.