News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Timings for fast median algo

Started by jj2007, July 24, 2020, 12:30:27 AM

Previous topic - Next topic

jj2007

May I have some timings please? This is related to Guga's thread, the algo is ruthlessly stolen and modified from Marinus :cool:

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
1 ms to find the median of 0 Mio items, which is 44669
1 ms to find the median of 0 Mio items, which is 44669
4 ms for sorting, the real median is 44672
Speed gain is a factor 4.0, precision loss is -0.00703%
100000 numbers generated
11 ms to find the median of 0 Mio items, which is 48800
5 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
17 ms for sorting, the real median is 48801
Speed gain is a factor 2.5, precision loss is -0.00307%
1000000 numbers generated
14 ms to find the median of 1 Mio items, which is 93879
14 ms to find the median of 1 Mio items, which is 93879
15 ms to find the median of 1 Mio items, which is 93879
127 ms for sorting, the real median is 93880
Speed gain is a factor 8.9, precision loss is -0.00122%
10000000 numbers generated
146 ms to find the median of 10 Mio items, which is 543842
151 ms to find the median of 10 Mio items, which is 543842
142 ms to find the median of 10 Mio items, which is 543842
1284 ms for sorting, the real median is 543842
Speed gain is a factor 8.8, precision loss is -4.60e-05%
100000000 numbers generated
1432 ms to find the median of 100 Mio items, which is 5043857
1429 ms to find the median of 100 Mio items, which is 5043857
1437 ms to find the median of 100 Mio items, which is 5043857
11089 ms for sorting, the real median is 5043855
Speed gain is a factor 7.7, precision loss is 2.97e-05%

Siekmanski

 :biggrin: Share & Enjoy

It couldn't open test.int
This is what I got so far.

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
2 ms for sorting, the real median is 44672
Speed gain is a factor 6.0, precision loss is -0.00703%
100000 numbers generated
2 ms to find the median of 0 Mio items, which is 48800
2 ms to find the median of 0 Mio items, which is 48800
2 ms to find the median of 0 Mio items, which is 48800
28 ms for sorting, the real median is 48801
Speed gain is a factor 14., precision loss is -0.00307%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
111 ms for sorting, the real median is 93880
Speed gain is a factor 16., precision loss is -0.00122%
10000000 numbers generated
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
1115 ms for sorting, the real median is 543842
Speed gain is a factor 16., precision loss is -4.60e-05%
100000000 numbers generated
702 ms to find the median of 100 Mio items, which is 5043857
691 ms to find the median of 100 Mio items, which is 5043857
694 ms to find the median of 100 Mio items, which is 5043857
9492 ms for sorting, the real median is 5043855
Speed gain is a factor 14., precision loss is 2.97e-05%
Creative coders use backward thinking techniques as a strategy.

jj2007

A factor 14 for 100 Mio items - nice :thumbsup:

daydreamer

factor of 18 is great :thumbsup:
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
2 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
5 ms for sorting, the real median is 44672
Speed gain is a factor 7.5, precision loss is -0.00703%
100000 numbers generated
6 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
27 ms for sorting, the real median is 48801
Speed gain is a factor 5.8, precision loss is -0.00307%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
125 ms for sorting, the real median is 93880
Speed gain is a factor 18., precision loss is -0.00122%
10000000 numbers generated
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
1229 ms for sorting, the real median is 543842
Speed gain is a factor 18., precision loss is -4.60e-05%
100000000 numbers generated
678 ms to find the median of 100 Mio items, which is 5043857
673 ms to find the median of 100 Mio items, which is 5043857
673 ms to find the median of 100 Mio items, which is 5043857
10262 ms for sorting, the real median is 5043855
Speed gain is a factor 15., precision loss is 2.97e-05%
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

HSE

CalculateMedian it's failling because some lines have dropped somewhere  :biggrin::
FindMedianLP:
    inc     eax
    movzx   ecx,byte ptr[edi+eax]   ; Count of single and/or duplicate numbers. ( zero is not a number )
    add     edx,ecx                 ; Add it to the Median counter.
    cmp     edx,ebx                 ; Compare if we reached the middle of the PopulationSize.
    jbe     FindMedianLP            ; Repeat until we reached the middle.
    dec     esi                     ; Check even or uneven.
    js      Ready                   ; Ready if uneven.

    mov ecx, edx            <- missing
    sub ecx, ebx             <- missing

    cmp     ecx,1                   ; Check if next number is a duplicate. ( no averaging needed )
    ja      Ready                   ; Ready if next number is the same value as the previous number.
    mov     edx,eax                 ; Save the first middle number to calculate the average of 2 middle numbers.
FindSecondMiddleNumberLP:
Equations in Assembly: SmplMath

guga

This is the results i´ve got :greenclp: :greenclp: :greenclp: :greenclp:


AMD Ryzen 5 2400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
2 ms for sorting, the real median is 44672
Speed gain is a factor 6.0, precision loss is -0.00703%
100000 numbers generated
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
18 ms for sorting, the real median is 48801
Speed gain is a factor 18., precision loss is -0.00307%
1000000 numbers generated
9 ms to find the median of 1 Mio items, which is 93879
15 ms to find the median of 1 Mio items, which is 93879
19 ms to find the median of 1 Mio items, which is 93879
149 ms for sorting, the real median is 93880
Speed gain is a factor 10., precision loss is -0.00122%
10000000 numbers generated
103 ms to find the median of 10 Mio items, which is 543842
123 ms to find the median of 10 Mio items, which is 543842
145 ms to find the median of 10 Mio items, which is 543842
1137 ms for sorting, the real median is 543842
Speed gain is a factor 9.2, precision loss is -4.60e-05%
100000000 numbers generated
1261 ms to find the median of 100 Mio items, which is 5043857
1143 ms to find the median of 100 Mio items, which is 5043857
1244 ms to find the median of 100 Mio items, which is 5043857
10776 ms for sorting, the real median is 5043855
Speed gain is a factor 8.9, precision loss is 2.97e-05%



Question. When you inputed the random numbers you limited them to positive integers from 0 to 255, right ? And also limited the amount of repetitions to a max of 255 too ? I´m asking because, if i understood correctly, the algo has a input limit of 255 that can be feeded on the buffer per pixel here "    inc     byte ptr [edi+eax]      ; Populate the Median buffer with the values as index numbers "

If there are, let´s say, a input containing 7000 numbers and from this input, 1000 numbers are the value 120, then the array buffer will extrapolate the  limit used on the byte storage. The solution is simply making the table contains dwords rather then bytes, so it could store  4294967295 repetitions per pixel. Not sure how it will affect the speed, though  :eusa_pray:

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

TimoVJL

As guga have AMD Ryzen, so
AMD Ryzen 5 3400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
1 ms for sorting, the real median is 44672
Speed gain is a factor 3.0, precision loss is -0.00703%
100000 numbers generated
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
8 ms for sorting, the real median is 48801
Speed gain is a factor 8.0, precision loss is -0.00307%
1000000 numbers generated
8 ms to find the median of 1 Mio items, which is 93879
9 ms to find the median of 1 Mio items, which is 93879
9 ms to find the median of 1 Mio items, which is 93879
90 ms for sorting, the real median is 93880
Speed gain is a factor 10., precision loss is -0.00122%
10000000 numbers generated
91 ms to find the median of 10 Mio items, which is 543842
93 ms to find the median of 10 Mio items, which is 543842
93 ms to find the median of 10 Mio items, which is 543842
916 ms for sorting, the real median is 543842
Speed gain is a factor 9.9, precision loss is -4.60e-05%
100000000 numbers generated
937 ms to find the median of 100 Mio items, which is 5043857
937 ms to find the median of 100 Mio items, which is 5043857
937 ms to find the median of 100 Mio items, which is 5043857
7719 ms for sorting, the real median is 5043855
Speed gain is a factor 8.2, precision loss is 2.97e-05%
May the source be with you

jj2007

Quote from: guga on July 24, 2020, 02:51:25 AMQuestion. When you inputed the random numbers you limited them to positive integers from 0 to 255, right ? And also limited the amount of repetitions to a max of 255 too ? I´m asking because, if i understood correctly, the algo has a input limit of 255 that can be feeded on the buffer per pixel here "    inc     byte ptr [edi+eax]      ; Populate the Median buffer with the values as index numbers "

If there are, let´s say, a input containing 7000 numbers and from this input, 1000 numbers are the value 120, then the array buffer will extrapolate the  limit used on the byte storage. The solution is simply making the table contains dwords rather then bytes, so it could store  4294967295 repetitions per pixel. Not sure how it will affect the speed, though  :eusa_pray:

Check MbMedianP proc. I am already using DWORDs, as you suggested :cool:

Siekmanski

I'm missing a test for byte values between 0 - 255.
Lets say 1920 * 1080 = 2073600 byte sized numbers, and a 256 DWORD sized buffer.

Then it should perform at its best.
It's designed especially for these specs.   :cool:
Creative coders use backward thinking techniques as a strategy.

nidud

#9
deleted

jj2007

Quote from: nidud on July 24, 2020, 04:08:43 AMSo it handles negative values but fails on even counts.

How does it "fail"? I see consistent results for odd and even element counts.

Besides, MbMedian is no longer the algo of choice, since Marinus' algo is a lot faster. Check MbMedianP

nidud

#11
deleted

jj2007

Quote from: Siekmanski on July 24, 2020, 04:05:47 AM
I'm missing a test for byte values between 0 - 255.

I've modified your algo so it becomes an allrounder (see MbMedianP): integers, REALx, negative values etc. Byte values is a special case then, but it will be somewhat slower than a dedicated byte-sized algo.

jj2007

New test, now alternating odd and even element counts, and two runs for small numbers. It looks promising :tongue:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.124, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.81, precision loss is -0.0145%

1000000 numbers generated
11 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
119 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.2, precision loss is 0.0205%

10000001 numbers generated
117 ms to find the median of 10 Mio items, which is 26505.42
115 ms to find the median of 10 Mio items, which is 26505.42
120 ms to find the median of 10 Mio items, which is 26505.42
1331 ms for sorting, the real median is 26510.57
Speed gain is a factor 11.3, precision loss is -0.0194%

100000010 numbers generated
1165 ms to find the median of 100 Mio items, which is 26505.42
1175 ms to find the median of 100 Mio items, which is 26505.42
1180 ms to find the median of 100 Mio items, which is 26505.42
11945 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.2, precision loss is -0.0190%

guga

Awesome :eusa_dance: :eusa_dance: :eusa_dance: :eusa_dance:



159     bytes for MbMedianP
AMD Ryzen 5 2400G with Radeon Vega Graphics     (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting

10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0297, precision loss is 0.00275%

101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.40, precision loss is -0.0145%

1000000 numbers generated
6 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
9 ms to find the median of 1 Mio items, which is 26572.96
110 ms for sorting, the real median is 26567.51
Speed gain is a factor 13.8, precision loss is 0.0205%

10000001 numbers generated
153 ms to find the median of 10 Mio items, which is 26505.42
133 ms to find the median of 10 Mio items, which is 26505.42
108 ms to find the median of 10 Mio items, which is 26505.42
1381 ms for sorting, the real median is 26510.57
Speed gain is a factor 10.5, precision loss is -0.0194%

100000010 numbers generated
1277 ms to find the median of 100 Mio items, which is 26505.42
1212 ms to find the median of 100 Mio items, which is 26505.42
1129 ms to find the median of 100 Mio items, which is 26505.42
12886 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.7, precision loss is -0.0190%
hit any key                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     



It´s working for integers, real4 and negative too ? :dazzled: :dazzled: :dazzled: :dazzled: :eusa_clap: :eusa_clap: :eusa_clap: :eusa_clap:

Which version ? This ? "MbMedianP_s"

If it is, did you checked the results ? I would like to give a test on other routines to see how to adapt to the algo we are creating for the image processing.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com