The MASM Forum

General => The Laboratory => Topic started by: jj2007 on July 24, 2020, 12:30:27 AM

Title: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 12:30:27 AM
May I have some timings please? This is related to Guga's thread, the algo is ruthlessly stolen and modified from Marinus :cool:

Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
1 ms to find the median of 0 Mio items, which is 44669
1 ms to find the median of 0 Mio items, which is 44669
4 ms for sorting, the real median is 44672
Speed gain is a factor 4.0, precision loss is -0.00703%
100000 numbers generated
11 ms to find the median of 0 Mio items, which is 48800
5 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
17 ms for sorting, the real median is 48801
Speed gain is a factor 2.5, precision loss is -0.00307%
1000000 numbers generated
14 ms to find the median of 1 Mio items, which is 93879
14 ms to find the median of 1 Mio items, which is 93879
15 ms to find the median of 1 Mio items, which is 93879
127 ms for sorting, the real median is 93880
Speed gain is a factor 8.9, precision loss is -0.00122%
10000000 numbers generated
146 ms to find the median of 10 Mio items, which is 543842
151 ms to find the median of 10 Mio items, which is 543842
142 ms to find the median of 10 Mio items, which is 543842
1284 ms for sorting, the real median is 543842
Speed gain is a factor 8.8, precision loss is -4.60e-05%
100000000 numbers generated
1432 ms to find the median of 100 Mio items, which is 5043857
1429 ms to find the median of 100 Mio items, which is 5043857
1437 ms to find the median of 100 Mio items, which is 5043857
11089 ms for sorting, the real median is 5043855
Speed gain is a factor 7.7, precision loss is 2.97e-05%
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 24, 2020, 01:02:06 AM
 :biggrin: Share & Enjoy

It couldn't open test.int
This is what I got so far.

Code: [Select]
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
2 ms for sorting, the real median is 44672
Speed gain is a factor 6.0, precision loss is -0.00703%
100000 numbers generated
2 ms to find the median of 0 Mio items, which is 48800
2 ms to find the median of 0 Mio items, which is 48800
2 ms to find the median of 0 Mio items, which is 48800
28 ms for sorting, the real median is 48801
Speed gain is a factor 14., precision loss is -0.00307%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
111 ms for sorting, the real median is 93880
Speed gain is a factor 16., precision loss is -0.00122%
10000000 numbers generated
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
1115 ms for sorting, the real median is 543842
Speed gain is a factor 16., precision loss is -4.60e-05%
100000000 numbers generated
702 ms to find the median of 100 Mio items, which is 5043857
691 ms to find the median of 100 Mio items, which is 5043857
694 ms to find the median of 100 Mio items, which is 5043857
9492 ms for sorting, the real median is 5043855
Speed gain is a factor 14., precision loss is 2.97e-05%
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 01:53:20 AM
A factor 14 for 100 Mio items - nice :thumbsup:
Title: Re: Timings for fast median algo
Post by: daydreamer on July 24, 2020, 02:18:28 AM
factor of 18 is great :thumbsup:
Code: [Select]
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
2 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
5 ms for sorting, the real median is 44672
Speed gain is a factor 7.5, precision loss is -0.00703%
100000 numbers generated
6 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
27 ms for sorting, the real median is 48801
Speed gain is a factor 5.8, precision loss is -0.00307%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
125 ms for sorting, the real median is 93880
Speed gain is a factor 18., precision loss is -0.00122%
10000000 numbers generated
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
1229 ms for sorting, the real median is 543842
Speed gain is a factor 18., precision loss is -4.60e-05%
100000000 numbers generated
678 ms to find the median of 100 Mio items, which is 5043857
673 ms to find the median of 100 Mio items, which is 5043857
673 ms to find the median of 100 Mio items, which is 5043857
10262 ms for sorting, the real median is 5043855
Speed gain is a factor 15., precision loss is 2.97e-05%
 
Title: Re: Timings for fast median algo
Post by: HSE on July 24, 2020, 02:47:31 AM
CalculateMedian it's failling because some lines have dropped somewhere  :biggrin::
Code: [Select]
FindMedianLP:
    inc     eax
    movzx   ecx,byte ptr[edi+eax]   ; Count of single and/or duplicate numbers. ( zero is not a number )
    add     edx,ecx                 ; Add it to the Median counter.
    cmp     edx,ebx                 ; Compare if we reached the middle of the PopulationSize.
    jbe     FindMedianLP            ; Repeat until we reached the middle.
    dec     esi                     ; Check even or uneven.
    js      Ready                   ; Ready if uneven.

    mov ecx, edx            <- missing
    sub ecx, ebx             <- missing

    cmp     ecx,1                   ; Check if next number is a duplicate. ( no averaging needed )
    ja      Ready                   ; Ready if next number is the same value as the previous number.
    mov     edx,eax                 ; Save the first middle number to calculate the average of 2 middle numbers.
FindSecondMiddleNumberLP:
Title: Re: Timings for fast median algo
Post by: guga on July 24, 2020, 02:51:25 AM
This is the results i´ve got :greenclp: :greenclp: :greenclp: :greenclp:

Code: [Select]
AMD Ryzen 5 2400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
2 ms for sorting, the real median is 44672
Speed gain is a factor 6.0, precision loss is -0.00703%
100000 numbers generated
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
18 ms for sorting, the real median is 48801
Speed gain is a factor 18., precision loss is -0.00307%
1000000 numbers generated
9 ms to find the median of 1 Mio items, which is 93879
15 ms to find the median of 1 Mio items, which is 93879
19 ms to find the median of 1 Mio items, which is 93879
149 ms for sorting, the real median is 93880
Speed gain is a factor 10., precision loss is -0.00122%
10000000 numbers generated
103 ms to find the median of 10 Mio items, which is 543842
123 ms to find the median of 10 Mio items, which is 543842
145 ms to find the median of 10 Mio items, which is 543842
1137 ms for sorting, the real median is 543842
Speed gain is a factor 9.2, precision loss is -4.60e-05%
100000000 numbers generated
1261 ms to find the median of 100 Mio items, which is 5043857
1143 ms to find the median of 100 Mio items, which is 5043857
1244 ms to find the median of 100 Mio items, which is 5043857
10776 ms for sorting, the real median is 5043855
Speed gain is a factor 8.9, precision loss is 2.97e-05%


Question. When you inputed the random numbers you limited them to positive integers from 0 to 255, right ? And also limited the amount of repetitions to a max of 255 too ? I´m asking because, if i understood correctly, the algo has a input limit of 255 that can be feeded on the buffer per pixel here "    inc     byte ptr [edi+eax]      ; Populate the Median buffer with the values as index numbers "

If there are, let´s say, a input containing 7000 numbers and from this input, 1000 numbers are the value 120, then the array buffer will extrapolate the  limit used on the byte storage. The solution is simply making the table contains dwords rather then bytes, so it could store  4294967295 repetitions per pixel. Not sure how it will affect the speed, though  :eusa_pray:

Title: Re: Timings for fast median algo
Post by: TimoVJL on July 24, 2020, 03:11:31 AM
As guga have AMD Ryzen, so
Code: [Select]
AMD Ryzen 5 3400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
1 ms for sorting, the real median is 44672
Speed gain is a factor 3.0, precision loss is -0.00703%
100000 numbers generated
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
8 ms for sorting, the real median is 48801
Speed gain is a factor 8.0, precision loss is -0.00307%
1000000 numbers generated
8 ms to find the median of 1 Mio items, which is 93879
9 ms to find the median of 1 Mio items, which is 93879
9 ms to find the median of 1 Mio items, which is 93879
90 ms for sorting, the real median is 93880
Speed gain is a factor 10., precision loss is -0.00122%
10000000 numbers generated
91 ms to find the median of 10 Mio items, which is 543842
93 ms to find the median of 10 Mio items, which is 543842
93 ms to find the median of 10 Mio items, which is 543842
916 ms for sorting, the real median is 543842
Speed gain is a factor 9.9, precision loss is -4.60e-05%
100000000 numbers generated
937 ms to find the median of 100 Mio items, which is 5043857
937 ms to find the median of 100 Mio items, which is 5043857
937 ms to find the median of 100 Mio items, which is 5043857
7719 ms for sorting, the real median is 5043855
Speed gain is a factor 8.2, precision loss is 2.97e-05%
 
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 03:25:57 AM
Question. When you inputed the random numbers you limited them to positive integers from 0 to 255, right ? And also limited the amount of repetitions to a max of 255 too ? I´m asking because, if i understood correctly, the algo has a input limit of 255 that can be feeded on the buffer per pixel here "    inc     byte ptr [edi+eax]      ; Populate the Median buffer with the values as index numbers "

If there are, let´s say, a input containing 7000 numbers and from this input, 1000 numbers are the value 120, then the array buffer will extrapolate the  limit used on the byte storage. The solution is simply making the table contains dwords rather then bytes, so it could store  4294967295 repetitions per pixel. Not sure how it will affect the speed, though  :eusa_pray:

Check MbMedianP proc. I am already using DWORDs, as you suggested :cool:
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 24, 2020, 04:05:47 AM
I'm missing a test for byte values between 0 - 255.
Lets say 1920 * 1080 = 2073600 byte sized numbers, and a 256 DWORD sized buffer.

Then it should perform at its best.
It's designed especially for these specs.   :cool:
Title: Re: Timings for fast median algo
Post by: nidud on July 24, 2020, 04:08:43 AM
Timings for MbMedian

Negative values:

    47827 cycles 0.asm: median
  2753682 cycles 1.asm: MbMedian

Positive values:

    48639 cycles 0.asm: median
  2460688 cycles 1.asm: MbMedian

So it handles negative values but fails on even counts.
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 04:52:07 AM
So it handles negative values but fails on even counts.

How does it "fail"? I see consistent results for odd and even element counts.

Besides, MbMedian is no longer the algo of choice, since Marinus' algo is a lot faster. Check MbMedianP
Title: Re: Timings for fast median algo
Post by: nidud on July 24, 2020, 05:27:07 AM
So it handles negative values but fails on even counts.

How does it "fail"? I see consistent results for odd and even element counts.

Input:(-120,-121,-122,-123,-124,-110,-100,-90,-80,-70,-60,-50,-40,-30,-20,-10) --> [-90,-80] --> -85. The returned value from Mb is -80.

Quote
Besides, MbMedian is no longer the algo of choice, since Marinus' algo is a lot faster. Check MbMedianP

timeit.asm:

procs equ <for x,<0,1,2>> ; add functions to test...

    .data
    info_0 db "median",0
    info_1 db "MbMedian",0
    info_2 db "MbMedianP",0

Dump the algo into 2.asm and see how it goes.
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 05:40:48 AM
I'm missing a test for byte values between 0 - 255.

I've modified your algo so it becomes an allrounder (see MbMedianP): integers, REALx, negative values etc. Byte values is a special case then, but it will be somewhat slower than a dedicated byte-sized algo.
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 12:30:40 PM
New test, now alternating odd and even element counts, and two runs for small numbers. It looks promising :tongue:
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.124, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.81, precision loss is -0.0145%

1000000 numbers generated
11 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
119 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.2, precision loss is 0.0205%

10000001 numbers generated
117 ms to find the median of 10 Mio items, which is 26505.42
115 ms to find the median of 10 Mio items, which is 26505.42
120 ms to find the median of 10 Mio items, which is 26505.42
1331 ms for sorting, the real median is 26510.57
Speed gain is a factor 11.3, precision loss is -0.0194%

100000010 numbers generated
1165 ms to find the median of 100 Mio items, which is 26505.42
1175 ms to find the median of 100 Mio items, which is 26505.42
1180 ms to find the median of 100 Mio items, which is 26505.42
11945 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.2, precision loss is -0.0190%
Title: Re: Timings for fast median algo
Post by: guga on July 24, 2020, 01:12:17 PM
Awesome :eusa_dance: :eusa_dance: :eusa_dance: :eusa_dance:


Code: [Select]
159     bytes for MbMedianP
AMD Ryzen 5 2400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0297, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.40, precision loss is -0.0145%

1000000 numbers generated
6 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
9 ms to find the median of 1 Mio items, which is 26572.96
110 ms for sorting, the real median is 26567.51
Speed gain is a factor 13.8, precision loss is 0.0205%

10000001 numbers generated
153 ms to find the median of 10 Mio items, which is 26505.42
133 ms to find the median of 10 Mio items, which is 26505.42
108 ms to find the median of 10 Mio items, which is 26505.42
1381 ms for sorting, the real median is 26510.57
Speed gain is a factor 10.5, precision loss is -0.0194%

100000010 numbers generated
1277 ms to find the median of 100 Mio items, which is 26505.42
1212 ms to find the median of 100 Mio items, which is 26505.42
1129 ms to find the median of 100 Mio items, which is 26505.42
12886 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.7, precision loss is -0.0190%
hit any key                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     


It´s working for integers, real4 and negative too ? :dazzled: :dazzled: :dazzled: :dazzled: :eusa_clap: :eusa_clap: :eusa_clap: :eusa_clap:

Which version ? This ? "MbMedianP_s"

If it is, did you checked the results ? I would like to give a test on other routines to see how to adapt to the algo we are creating for the image processing.
Title: Re: Timings for fast median algo
Post by: TimoVJL on July 24, 2020, 06:21:14 PM
Code: [Select]
159     bytes for MbMedianP
AMD Ryzen 5 3400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0191, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.67, precision loss is -0.0145%

1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
100 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.8, precision loss is 0.0205%

10000001 numbers generated
91 ms to find the median of 10 Mio items, which is 26505.42
86 ms to find the median of 10 Mio items, which is 26505.42
86 ms to find the median of 10 Mio items, which is 26505.42
1051 ms for sorting, the real median is 26510.57
Speed gain is a factor 12.0, precision loss is -0.0194%

100000010 numbers generated
933 ms to find the median of 100 Mio items, which is 26505.42
929 ms to find the median of 100 Mio items, which is 26505.42
931 ms to find the median of 100 Mio items, which is 26505.42
9461 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.2, precision loss is -0.0190%
hit any key
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 06:52:29 PM
It´s working for integers, real4 and negative too ?

Yes, but I still have to adapt it for other formats. Right now it handles only REAL4, negative numbers included.

It's MbMedianP in the source.
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 24, 2020, 07:48:30 PM
Code: [Select]
159     bytes for MbMedianP
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.126, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.10, precision loss is -0.0145%

1000000 numbers generated
4 ms to find the median of 1 Mio items, which is 26572.96
4 ms to find the median of 1 Mio items, which is 26572.96
4 ms to find the median of 1 Mio items, which is 26572.96
105 ms for sorting, the real median is 26567.51
Speed gain is a factor 23.5, precision loss is 0.0205%

10000001 numbers generated
41 ms to find the median of 10 Mio items, which is 26505.42
41 ms to find the median of 10 Mio items, which is 26505.42
41 ms to find the median of 10 Mio items, which is 26505.42
1160 ms for sorting, the real median is 26510.57
Speed gain is a factor 28.1, precision loss is -0.0194%

100000010 numbers generated
421 ms to find the median of 100 Mio items, which is 26505.42
420 ms to find the median of 100 Mio items, which is 26505.42
416 ms to find the median of 100 Mio items, which is 26505.42
10341 ms for sorting, the real median is 26510.47
Speed gain is a factor 24.6, precision loss is -0.0190%
hit any key
Title: Re: Timings for fast median algo
Post by: guga on July 24, 2020, 07:54:49 PM
For what i saw we need to previously know the max and minimum values, right ?

Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).

The algo is calculating the middle of the table and getting the value of the last one in the table from the left, right ? I notice a small mistake when getting positive integers numbers.

At "FindSecondMiddleNumberLP" it is needed to change this:

Code: [Select]
         mov edx, eax ; Save the first middle number to calculate the average of 2 middle numbers.
FindSecondMiddleNumberLP: ; pretty rare case, mostly with small data sets
inc eax
cmp dword ptr[edi+4*eax], 1 ; ecx must be 1
jb FindSecondMiddleNumberLP
add eax, edx ; Second middle number found, add it to the first middle number.
shr eax, 1 ; And get the average.

To something like this

Code: [Select]
         mov edx, eax ; Save the first middle number to calculate the average of 2 middle numbers.
inc eax

FindSecondMiddleNumberLP: ; pretty rare case, mostly with small data sets
cmp dword ptr[edi+4*eax], 0 ; Search if the next address (so, the 1st one in the right table) is not empty
jne FoundFirstMiddleNumberRight ; If the address is already filled, exit the loop
                inc eax                                     ; search next address and jump over when the address at this position in the table is filled
        jmp FindSecondMiddleNumberLP ; <---- Sorry, forgot to port this when i was editing this post
FoundFirstMiddleNumberRight:

add eax, edx ; Second middle number found, add it to the first middle number.
shr eax, 1 ; And get the average.

I believe it is the proper masm version. In RosAsm, it is:

Code: [Select]
        mov edx eax     ; Save the first middle number to calculate the average of 2 middle numbers
        ; here we are always at the end of the left middle. Ex if size = 100, we are at pos 49th
        ; FindSecondMiddleNumberLP.
        inc eax
        while D$edi+eax*4 = 0
            inc eax
        End_while
        add eax edx
        shr eax 1           ; And get the average.

You can test it on these values (Positive integers):

[DataSelect9: D$ 136, 151, 216, 40, 13, 215, 24, 104,
                 230, 154, 227, 219, 52, 217, 38, 131,
                 228, 95, 178, 51, 70, 14, 9, 34,
                 80, 195, 59, 27, 100, 118, 78, 137,
                 65, 104, 7, 32, 23, 58, 202, 154,
                 238, 12, 11, 165, 223, 197, 98, 26,
                 231, 92, 29, 131, 25, 99, 97, 77,
                 155, 120, 154, 182, 177, 211, 103, 238,
                 209, 86, 205, 213, 180, 126, 55, 95,
                 186, 232, 196, 121, 237, 187, 26, 120,
                 13, 56, 186, 167, 14, 136, 90, 196,
                 120, 23, 190, 79, 164, 41, 239, 39,
                 18, 109, 35, 169] ; Size = 100, Median = 119

The reason is that when you see the data in the proper order, you will see something like this:

Left Middle Table

0 ___________________________49 (Position starting from zero)
7   9   11   12   13   13   (...)   118 <--- marinus code found this value when reached FindSecondMiddleNumberLP


Right Middle Table
50 ___________________________100 (Position starting from 50)
120   120   120   121   (...)   238   238   239
________
      |
and we need to find this

Since we have a gap of 2 (120-118), it means that the next address on the right that contains a value is the one 2 dword ahead. Since we already incremented this value at eax, we must 1st see if that address is not empty. If it is empty, we perform the loop untill the next address that is full and increment on each loop.


Can you please, test to see if this is correct. On the values i tested above, it found the proper median. (On positive integers it seems to work. Not sure how it will behave in negative or floats)
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 24, 2020, 08:52:25 PM
-> Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).

My routine is not the fault, must be something else.  :biggrin:

compare this part with with your conversion:

Code: [Select]
FindSecondMiddleNumberLP:
    inc     eax   
    mov     ecx,[edi+eax*4]
    test    ecx,ecx                 ; If zero, its not a number.
    jz      FindSecondMiddleNumberLP
Title: Re: Timings for fast median algo
Post by: daydreamer on July 24, 2020, 10:01:04 PM
Code: [Select]
159     bytes for MbMedianP
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0437, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.80, precision loss is -0.0145%

1000000 numbers generated
5 ms to find the median of 1 Mio items, which is 26572.96
5 ms to find the median of 1 Mio items, which is 26572.96
5 ms to find the median of 1 Mio items, which is 26572.96
127 ms for sorting, the real median is 26567.51
Speed gain is a factor 22.9, precision loss is 0.0205%

10000001 numbers generated
56 ms to find the median of 10 Mio items, which is 26505.42
56 ms to find the median of 10 Mio items, which is 26505.42
56 ms to find the median of 10 Mio items, which is 26505.42
1357 ms for sorting, the real median is 26510.57
Speed gain is a factor 24.0, precision loss is -0.0194%

100000010 numbers generated
560 ms to find the median of 100 Mio items, which is 26505.42
565 ms to find the median of 100 Mio items, which is 26505.42
552 ms to find the median of 100 Mio items, which is 26505.42
12239 ms for sorting, the real median is 26510.47
Speed gain is a factor 21.9, precision loss is -0.0190%
hit any key
Title: Re: Timings for fast median algo
Post by: FORTRANS on July 24, 2020, 10:49:47 PM
Hi,

   Two results, the second may have run into trouble?  (Memory?)

Code: [Select]
159     bytes for MbMedianP
Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 4.37, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.89, precision loss is -0.0145%

1000000 numbers generated
8 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
7 ms to find the median of 1 Mio items, which is 26572.96
214 ms for sorting, the real median is 26567.51
Speed gain is a factor 26.7, precision loss is 0.0205%

10000001 numbers generated
83 ms to find the median of 10 Mio items, which is 26505.42
82 ms to find the median of 10 Mio items, which is 26505.42
82 ms to find the median of 10 Mio items, which is 26505.42
2536 ms for sorting, the real median is 26510.57
Speed gain is a factor 30.6, precision loss is -0.0194%

100000010 numbers generated
839 ms to find the median of 100 Mio items, which is 26505.42
828 ms to find the median of 100 Mio items, which is 26505.42
842 ms to find the median of 100 Mio items, which is 26505.42
21065 ms for sorting, the real median is 26510.47
Speed gain is a factor 25.2, precision loss is -0.0190%
hit any key

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Stephan R. Novosad>ncd f:test
NCD-Norton Change Directory, Advanced Edition, (C) Copr 1987, Peter Norton

Changing to F:\TEMP\TEST

F:\TEMP\TEST>arraymed
159     bytes for MbMedianP
Intel(R) Pentium(R) M processor 1.70GHz (MMX, SSE, SSE2)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.145, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.14, precision loss is -0.0145%

1000000 numbers generated
22 ms to find the median of 1 Mio items, which is 26572.96
22 ms to find the median of 1 Mio items, which is 26572.96
22 ms to find the median of 1 Mio items, which is 26572.96
236 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.6, precision loss is 0.0205%

10000001 numbers generated
225 ms to find the median of 10 Mio items, which is 26505.42
223 ms to find the median of 10 Mio items, which is 26505.42
222 ms to find the median of 10 Mio items, which is 26505.42
2545 ms for sorting, the real median is 26510.57
Speed gain is a factor 11.4, precision loss is -0.0194%

100000010 numbers generated
21138 ms to find the median of 100 Mio items, which is 26505.42
2266 ms to find the median of 100 Mio items, which is 26505.42
2240 ms to find the median of 100 Mio items, which is 26505.42
23114 ms for sorting, the real median is 26510.47
Speed gain is a factor 2.70, precision loss is -0.0190%
hit any key

Regards,

Steve
Title: Re: Timings for fast median algo
Post by: jj2007 on July 24, 2020, 11:36:33 PM
Steve,

You mean this one? Very slow but otherwise it looks ok...

100000010 numbers generated
21138 ms to find the median of 100 Mio items, which is 26505.42
2266 ms to find the median of 100 Mio items, which is 26505.42
Title: Re: Timings for fast median algo
Post by: FORTRANS on July 25, 2020, 01:00:04 AM
Hi Jochen,

   Yes, and the system became very sluggish and
somewhat unresponsive.  It is a fairly old laptop
though.

Regards,

Steve
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 01:05:14 AM
That's odd, because after the string "numbers generated" there is no HeapAlloc or similar that could slow down the system. The algo itself does not allocate anything. Is this behaviour reproducible? If it's a single core CPU, it could have been busy with other tasks...
Title: Re: Timings for fast median algo
Post by: FORTRANS on July 25, 2020, 03:53:17 AM
Hi,

That's odd, because after the string "numbers generated" there is no HeapAlloc or similar that could slow down the system. The algo itself does not allocate anything. Is this behaviour reproducible?

   Sort of reproducible.  Two of around six runs below.
Longest was ~30xxx ms, the shortest ~67xx ms.

Code: [Select]
100000010 numbers generated
16467 ms to find the median of 100 Mio items, which is 26505.42
2245 ms to find the median of 100 Mio items, which is 26505.42
2233 ms to find the median of 100 Mio items, which is 26505.42

100000010 numbers generated
20118 ms to find the median of 100 Mio items, which is 26505.42
2235 ms to find the median of 100 Mio items, which is 26505.42
2237 ms to find the median of 100 Mio items, which is 26505.42

Quote
If it's a single core CPU, it could have been busy with other tasks...

   Yes a single core CPU.  Sluggishness seems to
persist until rebooted.  Somewhere between weird and
strange.

Regards,

Steve
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 04:08:11 AM
hmmm... the 100 Mio items mean 400MB on the heap. But that's not the algo, it's just the creation of the array. Weird indeed.
Title: Re: Timings for fast median algo
Post by: guga on July 25, 2020, 07:15:23 AM
-> Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).

My routine is not the fault, must be something else.  :biggrin:

compare this part with with your conversion:

Code: [Select]
FindSecondMiddleNumberLP:
    inc     eax   
    mov     ecx,[edi+eax*4]
    test    ecx,ecx                 ; If zero, its not a number.
    jz      FindSecondMiddleNumberLP

Indeed. I mistakes the masm routines. Now it works like a gem :greensml: :greensml: :greensml:
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 06:34:26 PM
I stumbled over a set of data for which Marinus' (modified, using DWORDs) algo fails with an error of 3%: the algo returns 37386, but it should be 38532 :cool:
Code: [Select]
TestData dd 1332, 1575, 3891, 4226, 5629, 6868, 7295, 10283, 10375, 11866
dd 12064, 12231, 12895, 13365, 14384, 14490, 14501, 15995, 17093, 17275
dd 17833, 18070, 19096, 19988, 20168, 20765, 20861, 21012, 21344, 22404
dd 23377, 23947, 24033, 24401, 24709, 25594, 26025, 26058, 26928, 27084
dd 27571, 28237, 28792, 29066, 30189, 30724, 32569, 32765, 33198, 34558
dd 36270, 36987, 37167, 37288, 37573, 39491, 78260, 193983, 312869, 406851
dd 474584, 878091, 933154, 1535982, 1696834, 1826755, 1894374, 1935375, 1979114, 1980304
dd 1995476, 2163480, 2186807, 2364281, 2775589, 3086356, 3163282, 3296671, 3324133, 3368259
dd 3445517, 3506669, 3672249, 3797020, 4185277, 4691675, 5316867, 5403458, 6181758, 6361385
dd 6439150, 6544605, 6547329, 6819265, 6928806, 6963353, 7069587, 7384388, 7574183, 7697938
dd 7856932, 8106433, 8645157, 8771719, 8886262, 9014975, 9177706, 9481844, 9485408, 9488645
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 25, 2020, 08:04:02 PM
My routine is not to blame, must be something else.  :cool:
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 08:36:58 PM
My routine is not to blame, must be something else.  :cool:

You do it without normalisation: mov eax, (9488645+1) * 4, and that works :sad:

It's not a blame game, Marinus, I am just trying to understand why my version misbehaves for that specific case. Usually the deviation is in the 0.02% range, here it is 3%, a factor 150 higher...

Most probably it has to with resolution. The range here is 1332...9488645, and the real median is 38532, with is 0.4% of the max value. The distribution looks a bit extreme, see below. If I increase the buffer to 400,000 bytes, all is fine again.
Code: [Select]
179     bytes for ArrInf1
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00
0 ms to find the median of 11 items, which is 31328.00
0 ms to find the median of 11 items, which is 31328.00
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0245, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00
0 ms to find the median of 110 items, which is 38522.00
0 ms to find the median of 110 items, which is 38522.00
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.457, precision loss is -0.0260%

1000001 numbers generated
5 ms to find the median of 1 Mio items, which is 39751.00
5 ms to find the median of 1 Mio items, which is 39751.00
5 ms to find the median of 1 Mio items, which is 39751.00
100 ms for sorting, the real median is 39792.00
Speed gain is a factor 17.1, precision loss is -0.103%

10000010 numbers generated
60 ms to find the median of 10 Mio items, which is 39825.00
57 ms to find the median of 10 Mio items, which is 39825.00
57 ms to find the median of 10 Mio items, which is 39825.00
1048 ms for sorting, the real median is 39796.00
Speed gain is a factor 17.9, precision loss is 0.0729%

100000101 numbers generated
615 ms to find the median of 100 Mio items, which is 39820.00
587 ms to find the median of 100 Mio items, which is 39820.00
591 ms to find the median of 100 Mio items, which is 39820.00
9960 ms for sorting, the real median is 39796.00
Speed gain is a factor 16.7, precision loss is 0.0603%
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 25, 2020, 09:04:49 PM
That wasn't my intention, probably misunderstood the situation.
Sorry, if I gave you that feeling.

I just allocated the minimum buffer size for the data set.
What do you mean exactly with the normalisation?
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 09:14:41 PM
No bad feeling here, Marinus :biggrin:

What I am doing here is to adapt the algo to any range of values by normalising it:
Code: [Select]
aiPop:
  if aiInt
fild aiSize$ PTR [esi+aiSize*ecx] ; get original integer
  else
fld aiSize$ PTR [esi+aiSize*ecx] ; get original real
  endif
  fsub arrInf.yMin ; make positive
  fmul arrInf.yRange ; normalise 0...99999
  fistp arrInf.yIndex ; we need a
  mov eax, arrInf.yIndex ; reg32
  inc dword ptr [edi+4*eax] ; Populate the Median buffer with the values as index numbers and keep track for duplicates in one go
  dec ecx
  jns aiPop

That works fine, but (see image above) for very skewed distributions the deviation can be significant.

Results for the latest version (attached):
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0149, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.444, precision loss is -0.0260%

1000001 numbers generated
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
7 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
112 ms for sorting, the real median is 39792.00
Speed gain is a factor 13.4, precision loss is -0.103%

10000010 numbers generated
83 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
87 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
84 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
1146 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.5, precision loss is 0.0729%

100000101 numbers generated
848 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
842 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
841 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
10829 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.8, precision loss is 0.0603%
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 25, 2020, 09:20:31 PM
 :thumbsup:

Hi Jochen,

Now I understand what your goal is.
What about this idea to speed it up multiple times.
Multithreading.
Divide the data set in equal parts ( / max cpu thread count )
1 Median Buffer for each thread, when done adding the buffers together per index number and find the median, will speed it up.
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 09:23:24 PM
Well.... at 842 ms to find the median of 100 Mio items, I am not eager to dive into the secrecies of multithreaded programs ;-)

Google How do you find the median of a large data set? to see this (https://www.dummies.com/education/math/statistics/how-to-find-the-median-value-in-a-statistical-data-set/):
Quote
Order the values from smallest to largest.
If the data set contains an odd number of values , choose the one that is exactly in the middle. You've found the median.
If the data set contains an even number of values , take the two values that appear in the middle and average them to find the median.

So they think sorting is the way to go, which implies that their idea of a "large" data set is different from ours :tongue:
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 25, 2020, 09:53:46 PM
Sometimes, thinking outside of the box can give you a reward.
Most ideas are based on other ideas without understanding why the first idea was constructed that way.
Many ancient fast routines were designed to work for small memory sizes and slow memory reads and writes. Not a problem anymore.

I came up with this idea when I wrote a routine to find the unique number count of colors in a bitmap image.
When guga needed a fast median routine, I knew this would be fast, especially in a small range ( 0-255 ) and millions of colors.

Later it turned out to be for only 100 colors per iteration.
Title: Re: Timings for fast median algo
Post by: jj2007 on July 25, 2020, 10:22:42 PM
This is the goal :cool:
Code: [Select]
Dim MyArray() As REAL8   ; or DWORD, REAL4, ...
... fill it with values ...
Print Str$("The array has %i elements\n", MyArray(?))
Print Str$("The median is %f (a proxy, typically 0.02% off) \n", MyArray(median?))
Print Str$("The average is %f\n", MyArray(average?))
Print Str$("The minimum is %f\n", MyArray(min?))
Print Str$("The maximum is %f\n", MyArray(max?))
Title: Re: Timings for fast median algo
Post by: daydreamer on July 25, 2020, 11:58:19 PM
I think multithreading could be a good idea

maybe would be useful for Guga to have extra thread dedicated to filereading/filewriting in background,instead of main thread switches between fast algos running median/image/videoframe processing and to slow write processed images/data to disk when RAM is filled and readin unprocessed images/data ,to start the main loop again
Title: Re: Timings for fast median algo
Post by: guga on July 26, 2020, 09:31:47 AM
Multi-thread could be a good way to go, but 1st we need to see if the watermark remover algo will work as expected (i presume it will, but we didn´t get there yet :greensml: :greensml: :greensml:)

Btw....Someone has the timming of quicksort algo ? I would like to compare it with one thing i´m finishing here and will post soon :azn: :azn: :azn:.
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 26, 2020, 05:13:22 PM
In the MASM  lib you can find Qsort routines.
Title: Re: Timings for fast median algo
Post by: jj2007 on July 26, 2020, 06:42:52 PM
Btw....Someone has the timming of quicksort algo ?

No idea what you are talking about (http://masm32.com/board/index.php?topic=3643.msg38433#msg38433) :biggrin:
Title: Re: Timings for fast median algo
Post by: guga on July 26, 2020, 06:43:46 PM
HI Marinus...I know, but i wanted just to compare the results of qsort and another sorting algo called Bitonic sort. I opened a thread and posted the source (in masm too) to benchmark later.

Btw... How do i exchange the order of 2 floats in SSE from different registers ? Ex:

xmm1 = 1, 2, 3, 4
xmm2 = 5, 6, 7, 8

How do i exchange the order of the low and hi qword on both of them at once in a way that result in:

xmm1 = 1, 2, 5, 6
xmm2 = 3, 4, 6, 7
Title: Re: Timings for fast median algo
Post by: guga on July 26, 2020, 06:48:59 PM
Btw....Someone has the timming of quicksort algo ?

No idea what you are talking about (http://masm32.com/board/index.php?topic=3643.msg38433#msg38433) :biggrin:

Wow...I didn´t remember that

 :biggrin: :biggrin: :biggrin: :biggrin: :biggrin: :bgrin: :bgrin: :bgrin:
Title: Re: Timings for fast median algo
Post by: LiaoMi on July 26, 2020, 07:03:24 PM
No bad feeling here, Marinus :biggrin:

What I am doing here is to adapt the algo to any range of values by normalising it:
Code: [Select]
aiPop:
  if aiInt
fild aiSize$ PTR [esi+aiSize*ecx] ; get original integer
  else
fld aiSize$ PTR [esi+aiSize*ecx] ; get original real
  endif
  fsub arrInf.yMin ; make positive
  fmul arrInf.yRange ; normalise 0...99999
  fistp arrInf.yIndex ; we need a
  mov eax, arrInf.yIndex ; reg32
  inc dword ptr [edi+4*eax] ; Populate the Median buffer with the values as index numbers and keep track for duplicates in one go
  dec ecx
  jns aiPop

That works fine, but (see image above) for very skewed distributions the deviation can be significant.

Results for the latest version (attached):
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0149, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.444, precision loss is -0.0260%

1000001 numbers generated
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
7 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
112 ms for sorting, the real median is 39792.00
Speed gain is a factor 13.4, precision loss is -0.103%

10000010 numbers generated
83 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
87 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
84 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
1146 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.5, precision loss is 0.0729%

100000101 numbers generated
848 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
842 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
841 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
10829 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.8, precision loss is 0.0603%

Code: [Select]
179     bytes for ArrInf1
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0166, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.488, precision loss is -0.0260%

1000001 numbers generated
7 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
6 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
6 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
91 ms for sorting, the real median is 39792.00
Speed gain is a factor 14.1, precision loss is -0.103%

10000010 numbers generated
70 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
72 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
72 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
955 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.3, precision loss is 0.0729%

100000101 numbers generated
712 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
678 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
686 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
8931 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.9, precision loss is 0.0603%
hit any key 
Title: Re: Timings for fast median algo
Post by: Siekmanski on July 26, 2020, 08:05:19 PM
HI Marinus...I know, but i wanted just to compare the results of qsort and another sorting algo called Bitonic sort. I opened a thread and posted the source (in masm too) to benchmark later.

Btw... How do i exchange the order of 2 floats in SSE from different registers ? Ex:

xmm1 = 1, 2, 3, 4
xmm2 = 5, 6, 7, 8

How do i exchange the order of the low and hi qword on both of them at once in a way that result in:

xmm1 = 1, 2, 5, 6
xmm2 = 3, 4, 6, 7

With one or more of these beauties,

movlps movhps
shufps pshufd
unpcklps unpckhps

Play with them, once you understand how they work, they will be your best friends forever.  :eusa_dance:
Title: Re: Timings for fast median algo
Post by: guga on July 27, 2020, 09:00:21 AM
JJ. Does the adaptation you did to the algo works for negative limit as well ?

Ex..

  fsub arrInf.yMin   ; make positive
  fmul arrInf.yRange   ; normalise 0...99999

If the minimum limit is...say -1 and the max to 1 can this work too if i set those values ?
Title: Re: Timings for fast median algo
Post by: jj2007 on July 27, 2020, 05:43:26 PM
If the minimum limit is...say -1 and the max to 1 can this work too if i set those values ?

Like this?

include \masm32\MasmBasic\MasmBasic.inc
  Init
  Dim MyArray() As REAL8        ; can be DWORD, REAL4, REAL8
  For_ ecx=0 To 9999999
        Rand(-1.0, 1.0, MyArray(ecx))   ; put random number into Real8 array
  Next
  PrintLine Str$("The elements:\t%i", MyArray(?))
  PrintLine Str$("The minimum:\t%f", MyArray(?min)v)
  PrintLine Str$("The maximum:\t%9f", MyArray(?max)v)
  PrintLine Str$("The mean:\t%9f", MyArray(?mean)v)
  PrintLine Str$("The median:\t%f", MyArray(?median)v)
EndOfCode


Code: [Select]
The elements:   10000000
The minimum:    -0.9999998
The maximum:    0.999999964
The mean:       -1.45892021e-05
The median:     6.515711e-05
Title: Re: Timings for fast median algo
Post by: guga on July 27, 2020, 05:47:19 PM
Yes. perfect. Testing it right now.  :greenclp: :greenclp:

Tks a lot. :thumbsup: :thumbsup: :thumbsup: :thumbsup: