The MASM Forum
General => The Laboratory => Topic started by: jj2007 on July 24, 2020, 12:30:27 AM

May I have some timings please? This is related to Guga's thread, the algo is ruthlessly stolen and modified from Marinus :cool:
Intel(R) Core(TM) i52450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
1 ms to find the median of 0 Mio items, which is 44669
1 ms to find the median of 0 Mio items, which is 44669
4 ms for sorting, the real median is 44672
Speed gain is a factor 4.0, precision loss is 0.00703%
100000 numbers generated
11 ms to find the median of 0 Mio items, which is 48800
5 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
17 ms for sorting, the real median is 48801
Speed gain is a factor 2.5, precision loss is 0.00307%
1000000 numbers generated
14 ms to find the median of 1 Mio items, which is 93879
14 ms to find the median of 1 Mio items, which is 93879
15 ms to find the median of 1 Mio items, which is 93879
127 ms for sorting, the real median is 93880
Speed gain is a factor 8.9, precision loss is 0.00122%
10000000 numbers generated
146 ms to find the median of 10 Mio items, which is 543842
151 ms to find the median of 10 Mio items, which is 543842
142 ms to find the median of 10 Mio items, which is 543842
1284 ms for sorting, the real median is 543842
Speed gain is a factor 8.8, precision loss is 4.60e05%
100000000 numbers generated
1432 ms to find the median of 100 Mio items, which is 5043857
1429 ms to find the median of 100 Mio items, which is 5043857
1437 ms to find the median of 100 Mio items, which is 5043857
11089 ms for sorting, the real median is 5043855
Speed gain is a factor 7.7, precision loss is 2.97e05%

:biggrin: Share & Enjoy
It couldn't open test.int
This is what I got so far.
Intel(R) Core(TM) i74930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
2 ms for sorting, the real median is 44672
Speed gain is a factor 6.0, precision loss is 0.00703%
100000 numbers generated
2 ms to find the median of 0 Mio items, which is 48800
2 ms to find the median of 0 Mio items, which is 48800
2 ms to find the median of 0 Mio items, which is 48800
28 ms for sorting, the real median is 48801
Speed gain is a factor 14., precision loss is 0.00307%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
111 ms for sorting, the real median is 93880
Speed gain is a factor 16., precision loss is 0.00122%
10000000 numbers generated
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
1115 ms for sorting, the real median is 543842
Speed gain is a factor 16., precision loss is 4.60e05%
100000000 numbers generated
702 ms to find the median of 100 Mio items, which is 5043857
691 ms to find the median of 100 Mio items, which is 5043857
694 ms to find the median of 100 Mio items, which is 5043857
9492 ms for sorting, the real median is 5043855
Speed gain is a factor 14., precision loss is 2.97e05%

A factor 14 for 100 Mio items  nice :thumbsup:

factor of 18 is great :thumbsup:
Intel(R) Core(TM) i57200U CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
2 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
5 ms for sorting, the real median is 44672
Speed gain is a factor 7.5, precision loss is 0.00703%
100000 numbers generated
6 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
4 ms to find the median of 0 Mio items, which is 48800
27 ms for sorting, the real median is 48801
Speed gain is a factor 5.8, precision loss is 0.00307%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
7 ms to find the median of 1 Mio items, which is 93879
125 ms for sorting, the real median is 93880
Speed gain is a factor 18., precision loss is 0.00122%
10000000 numbers generated
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
70 ms to find the median of 10 Mio items, which is 543842
1229 ms for sorting, the real median is 543842
Speed gain is a factor 18., precision loss is 4.60e05%
100000000 numbers generated
678 ms to find the median of 100 Mio items, which is 5043857
673 ms to find the median of 100 Mio items, which is 5043857
673 ms to find the median of 100 Mio items, which is 5043857
10262 ms for sorting, the real median is 5043855
Speed gain is a factor 15., precision loss is 2.97e05%

CalculateMedian it's failling because some lines have dropped somewhere :biggrin::
FindMedianLP:
inc eax
movzx ecx,byte ptr[edi+eax] ; Count of single and/or duplicate numbers. ( zero is not a number )
add edx,ecx ; Add it to the Median counter.
cmp edx,ebx ; Compare if we reached the middle of the PopulationSize.
jbe FindMedianLP ; Repeat until we reached the middle.
dec esi ; Check even or uneven.
js Ready ; Ready if uneven.
mov ecx, edx < missing
sub ecx, ebx < missing
cmp ecx,1 ; Check if next number is a duplicate. ( no averaging needed )
ja Ready ; Ready if next number is the same value as the previous number.
mov edx,eax ; Save the first middle number to calculate the average of 2 middle numbers.
FindSecondMiddleNumberLP:

This is the results i´ve got :greenclp: :greenclp: :greenclp: :greenclp:
AMD Ryzen 5 2400G with Radeon Vega Graphics (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
2 ms for sorting, the real median is 44672
Speed gain is a factor 6.0, precision loss is 0.00703%
100000 numbers generated
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
18 ms for sorting, the real median is 48801
Speed gain is a factor 18., precision loss is 0.00307%
1000000 numbers generated
9 ms to find the median of 1 Mio items, which is 93879
15 ms to find the median of 1 Mio items, which is 93879
19 ms to find the median of 1 Mio items, which is 93879
149 ms for sorting, the real median is 93880
Speed gain is a factor 10., precision loss is 0.00122%
10000000 numbers generated
103 ms to find the median of 10 Mio items, which is 543842
123 ms to find the median of 10 Mio items, which is 543842
145 ms to find the median of 10 Mio items, which is 543842
1137 ms for sorting, the real median is 543842
Speed gain is a factor 9.2, precision loss is 4.60e05%
100000000 numbers generated
1261 ms to find the median of 100 Mio items, which is 5043857
1143 ms to find the median of 100 Mio items, which is 5043857
1244 ms to find the median of 100 Mio items, which is 5043857
10776 ms for sorting, the real median is 5043855
Speed gain is a factor 8.9, precision loss is 2.97e05%
Question. When you inputed the random numbers you limited them to positive integers from 0 to 255, right ? And also limited the amount of repetitions to a max of 255 too ? I´m asking because, if i understood correctly, the algo has a input limit of 255 that can be feeded on the buffer per pixel here " inc byte ptr [edi+eax] ; Populate the Median buffer with the values as index numbers "
If there are, let´s say, a input containing 7000 numbers and from this input, 1000 numbers are the value 120, then the array buffer will extrapolate the limit used on the byte storage. The solution is simply making the table contains dwords rather then bytes, so it could store 4294967295 repetitions per pixel. Not sure how it will affect the speed, though :eusa_pray:

As guga have AMD Ryzen, so
AMD Ryzen 5 3400G with Radeon Vega Graphics (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
10000 numbers generated
1 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
0 ms to find the median of 0 Mio items, which is 44669
1 ms for sorting, the real median is 44672
Speed gain is a factor 3.0, precision loss is 0.00703%
100000 numbers generated
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
1 ms to find the median of 0 Mio items, which is 48800
8 ms for sorting, the real median is 48801
Speed gain is a factor 8.0, precision loss is 0.00307%
1000000 numbers generated
8 ms to find the median of 1 Mio items, which is 93879
9 ms to find the median of 1 Mio items, which is 93879
9 ms to find the median of 1 Mio items, which is 93879
90 ms for sorting, the real median is 93880
Speed gain is a factor 10., precision loss is 0.00122%
10000000 numbers generated
91 ms to find the median of 10 Mio items, which is 543842
93 ms to find the median of 10 Mio items, which is 543842
93 ms to find the median of 10 Mio items, which is 543842
916 ms for sorting, the real median is 543842
Speed gain is a factor 9.9, precision loss is 4.60e05%
100000000 numbers generated
937 ms to find the median of 100 Mio items, which is 5043857
937 ms to find the median of 100 Mio items, which is 5043857
937 ms to find the median of 100 Mio items, which is 5043857
7719 ms for sorting, the real median is 5043855
Speed gain is a factor 8.2, precision loss is 2.97e05%

Question. When you inputed the random numbers you limited them to positive integers from 0 to 255, right ? And also limited the amount of repetitions to a max of 255 too ? I´m asking because, if i understood correctly, the algo has a input limit of 255 that can be feeded on the buffer per pixel here " inc byte ptr [edi+eax] ; Populate the Median buffer with the values as index numbers "
If there are, let´s say, a input containing 7000 numbers and from this input, 1000 numbers are the value 120, then the array buffer will extrapolate the limit used on the byte storage. The solution is simply making the table contains dwords rather then bytes, so it could store 4294967295 repetitions per pixel. Not sure how it will affect the speed, though :eusa_pray:
Check MbMedianP proc. I am already using DWORDs, as you suggested :cool:

I'm missing a test for byte values between 0  255.
Lets say 1920 * 1080 = 2073600 byte sized numbers, and a 256 DWORD sized buffer.
Then it should perform at its best.
It's designed especially for these specs. :cool:

Timings for MbMedian
Negative values:
47827 cycles 0.asm: median
2753682 cycles 1.asm: MbMedian
Positive values:
48639 cycles 0.asm: median
2460688 cycles 1.asm: MbMedian
So it handles negative values but fails on even counts.

So it handles negative values but fails on even counts.
How does it "fail"? I see consistent results for odd and even element counts.
Besides, MbMedian is no longer the algo of choice, since Marinus' algo is a lot faster. Check MbMedianP

So it handles negative values but fails on even counts.
How does it "fail"? I see consistent results for odd and even element counts.
Input:(120,121,122,123,124,110,100,90,80,70,60,50,40,30,20,10) > [90,80] > 85. The returned value from Mb is 80.
Besides, MbMedian is no longer the algo of choice, since Marinus' algo is a lot faster. Check MbMedianP
timeit.asm:
procs equ <for x,<0,1,2>> ; add functions to test...
.data
info_0 db "median",0
info_1 db "MbMedian",0
info_2 db "MbMedianP",0
Dump the algo into 2.asm and see how it goes.

I'm missing a test for byte values between 0  255.
I've modified your algo so it becomes an allrounder (see MbMedianP): integers, REALx, negative values etc. Byte values is a special case then, but it will be somewhat slower than a dedicated bytesized algo.

New test, now alternating odd and even element counts, and two runs for small numbers. It looks promising :tongue:
Intel(R) Core(TM) i52450M CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.124, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.81, precision loss is 0.0145%
1000000 numbers generated
11 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
119 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.2, precision loss is 0.0205%
10000001 numbers generated
117 ms to find the median of 10 Mio items, which is 26505.42
115 ms to find the median of 10 Mio items, which is 26505.42
120 ms to find the median of 10 Mio items, which is 26505.42
1331 ms for sorting, the real median is 26510.57
Speed gain is a factor 11.3, precision loss is 0.0194%
100000010 numbers generated
1165 ms to find the median of 100 Mio items, which is 26505.42
1175 ms to find the median of 100 Mio items, which is 26505.42
1180 ms to find the median of 100 Mio items, which is 26505.42
11945 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.2, precision loss is 0.0190%

Awesome :eusa_dance: :eusa_dance: :eusa_dance: :eusa_dance:
159 bytes for MbMedianP
AMD Ryzen 5 2400G with Radeon Vega Graphics (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0297, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.40, precision loss is 0.0145%
1000000 numbers generated
6 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
9 ms to find the median of 1 Mio items, which is 26572.96
110 ms for sorting, the real median is 26567.51
Speed gain is a factor 13.8, precision loss is 0.0205%
10000001 numbers generated
153 ms to find the median of 10 Mio items, which is 26505.42
133 ms to find the median of 10 Mio items, which is 26505.42
108 ms to find the median of 10 Mio items, which is 26505.42
1381 ms for sorting, the real median is 26510.57
Speed gain is a factor 10.5, precision loss is 0.0194%
100000010 numbers generated
1277 ms to find the median of 100 Mio items, which is 26505.42
1212 ms to find the median of 100 Mio items, which is 26505.42
1129 ms to find the median of 100 Mio items, which is 26505.42
12886 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.7, precision loss is 0.0190%
hit any key
It´s working for integers, real4 and negative too ? :dazzled: :dazzled: :dazzled: :dazzled: :eusa_clap: :eusa_clap: :eusa_clap: :eusa_clap:
Which version ? This ? "MbMedianP_s"
If it is, did you checked the results ? I would like to give a test on other routines to see how to adapt to the algo we are creating for the image processing.

159 bytes for MbMedianP
AMD Ryzen 5 3400G with Radeon Vega Graphics (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0191, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.67, precision loss is 0.0145%
1000000 numbers generated
7 ms to find the median of 1 Mio items, which is 26572.96
11 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
100 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.8, precision loss is 0.0205%
10000001 numbers generated
91 ms to find the median of 10 Mio items, which is 26505.42
86 ms to find the median of 10 Mio items, which is 26505.42
86 ms to find the median of 10 Mio items, which is 26505.42
1051 ms for sorting, the real median is 26510.57
Speed gain is a factor 12.0, precision loss is 0.0194%
100000010 numbers generated
933 ms to find the median of 100 Mio items, which is 26505.42
929 ms to find the median of 100 Mio items, which is 26505.42
931 ms to find the median of 100 Mio items, which is 26505.42
9461 ms for sorting, the real median is 26510.47
Speed gain is a factor 10.2, precision loss is 0.0190%
hit any key

It´s working for integers, real4 and negative too ?
Yes, but I still have to adapt it for other formats. Right now it handles only REAL4, negative numbers included.
It's MbMedianP in the source.

159 bytes for MbMedianP
Intel(R) Core(TM) i74930K CPU @ 3.40GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.126, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.10, precision loss is 0.0145%
1000000 numbers generated
4 ms to find the median of 1 Mio items, which is 26572.96
4 ms to find the median of 1 Mio items, which is 26572.96
4 ms to find the median of 1 Mio items, which is 26572.96
105 ms for sorting, the real median is 26567.51
Speed gain is a factor 23.5, precision loss is 0.0205%
10000001 numbers generated
41 ms to find the median of 10 Mio items, which is 26505.42
41 ms to find the median of 10 Mio items, which is 26505.42
41 ms to find the median of 10 Mio items, which is 26505.42
1160 ms for sorting, the real median is 26510.57
Speed gain is a factor 28.1, precision loss is 0.0194%
100000010 numbers generated
421 ms to find the median of 100 Mio items, which is 26505.42
420 ms to find the median of 100 Mio items, which is 26505.42
416 ms to find the median of 100 Mio items, which is 26505.42
10341 ms for sorting, the real median is 26510.47
Speed gain is a factor 24.6, precision loss is 0.0190%
hit any key

For what i saw we need to previously know the max and minimum values, right ?
Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).
The algo is calculating the middle of the table and getting the value of the last one in the table from the left, right ? I notice a small mistake when getting positive integers numbers.
At "FindSecondMiddleNumberLP" it is needed to change this:
mov edx, eax ; Save the first middle number to calculate the average of 2 middle numbers.
FindSecondMiddleNumberLP: ; pretty rare case, mostly with small data sets
inc eax
cmp dword ptr[edi+4*eax], 1 ; ecx must be 1
jb FindSecondMiddleNumberLP
add eax, edx ; Second middle number found, add it to the first middle number.
shr eax, 1 ; And get the average.
To something like this
mov edx, eax ; Save the first middle number to calculate the average of 2 middle numbers.
inc eax
FindSecondMiddleNumberLP: ; pretty rare case, mostly with small data sets
cmp dword ptr[edi+4*eax], 0 ; Search if the next address (so, the 1st one in the right table) is not empty
jne FoundFirstMiddleNumberRight ; If the address is already filled, exit the loop
inc eax ; search next address and jump over when the address at this position in the table is filled
jmp FindSecondMiddleNumberLP ; < Sorry, forgot to port this when i was editing this post
FoundFirstMiddleNumberRight:
add eax, edx ; Second middle number found, add it to the first middle number.
shr eax, 1 ; And get the average.
I believe it is the proper masm version. In RosAsm, it is:
mov edx eax ; Save the first middle number to calculate the average of 2 middle numbers
; here we are always at the end of the left middle. Ex if size = 100, we are at pos 49th
; FindSecondMiddleNumberLP.
inc eax
while D$edi+eax*4 = 0
inc eax
End_while
add eax edx
shr eax 1 ; And get the average.
You can test it on these values (Positive integers):
[DataSelect9: D$ 136, 151, 216, 40, 13, 215, 24, 104,
230, 154, 227, 219, 52, 217, 38, 131,
228, 95, 178, 51, 70, 14, 9, 34,
80, 195, 59, 27, 100, 118, 78, 137,
65, 104, 7, 32, 23, 58, 202, 154,
238, 12, 11, 165, 223, 197, 98, 26,
231, 92, 29, 131, 25, 99, 97, 77,
155, 120, 154, 182, 177, 211, 103, 238,
209, 86, 205, 213, 180, 126, 55, 95,
186, 232, 196, 121, 237, 187, 26, 120,
13, 56, 186, 167, 14, 136, 90, 196,
120, 23, 190, 79, 164, 41, 239, 39,
18, 109, 35, 169] ; Size = 100, Median = 119
The reason is that when you see the data in the proper order, you will see something like this:
Left Middle Table
0 ___________________________49 (Position starting from zero)
7 9 11 12 13 13 (...) 118 < marinus code found this value when reached FindSecondMiddleNumberLP
Right Middle Table
50 ___________________________100 (Position starting from 50)
120 120 120 121 (...) 238 238 239
________

and we need to find this
Since we have a gap of 2 (120118), it means that the next address on the right that contains a value is the one 2 dword ahead. Since we already incremented this value at eax, we must 1st see if that address is not empty. If it is empty, we perform the loop untill the next address that is full and increment on each loop.
Can you please, test to see if this is correct. On the values i tested above, it found the proper median. (On positive integers it seems to work. Not sure how it will behave in negative or floats)

> Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).
My routine is not the fault, must be something else. :biggrin:
compare this part with with your conversion:
FindSecondMiddleNumberLP:
inc eax
mov ecx,[edi+eax*4]
test ecx,ecx ; If zero, its not a number.
jz FindSecondMiddleNumberLP

159 bytes for MbMedianP
Intel(R) Core(TM) i57200U CPU @ 2.50GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sorting
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.0437, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 2.80, precision loss is 0.0145%
1000000 numbers generated
5 ms to find the median of 1 Mio items, which is 26572.96
5 ms to find the median of 1 Mio items, which is 26572.96
5 ms to find the median of 1 Mio items, which is 26572.96
127 ms for sorting, the real median is 26567.51
Speed gain is a factor 22.9, precision loss is 0.0205%
10000001 numbers generated
56 ms to find the median of 10 Mio items, which is 26505.42
56 ms to find the median of 10 Mio items, which is 26505.42
56 ms to find the median of 10 Mio items, which is 26505.42
1357 ms for sorting, the real median is 26510.57
Speed gain is a factor 24.0, precision loss is 0.0194%
100000010 numbers generated
560 ms to find the median of 100 Mio items, which is 26505.42
565 ms to find the median of 100 Mio items, which is 26505.42
552 ms to find the median of 100 Mio items, which is 26505.42
12239 ms for sorting, the real median is 26510.47
Speed gain is a factor 21.9, precision loss is 0.0190%
hit any key

Hi,
Two results, the second may have run into trouble? (Memory?)
159 bytes for MbMedianP
Intel(R) Core(TM) i34005U CPU @ 1.70GHz (MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, S
SE4.2, AVX)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 4.37, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.89, precision loss is 0.0145%
1000000 numbers generated
8 ms to find the median of 1 Mio items, which is 26572.96
8 ms to find the median of 1 Mio items, which is 26572.96
7 ms to find the median of 1 Mio items, which is 26572.96
214 ms for sorting, the real median is 26567.51
Speed gain is a factor 26.7, precision loss is 0.0205%
10000001 numbers generated
83 ms to find the median of 10 Mio items, which is 26505.42
82 ms to find the median of 10 Mio items, which is 26505.42
82 ms to find the median of 10 Mio items, which is 26505.42
2536 ms for sorting, the real median is 26510.57
Speed gain is a factor 30.6, precision loss is 0.0194%
100000010 numbers generated
839 ms to find the median of 100 Mio items, which is 26505.42
828 ms to find the median of 100 Mio items, which is 26505.42
842 ms to find the median of 100 Mio items, which is 26505.42
21065 ms for sorting, the real median is 26510.47
Speed gain is a factor 25.2, precision loss is 0.0190%
hit any key
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 19852001 Microsoft Corp.
C:\Documents and Settings\Stephan R. Novosad>ncd f:test
NCDNorton Change Directory, Advanced Edition, (C) Copr 1987, Peter Norton
Changing to F:\TEMP\TEST
F:\TEMP\TEST>arraymed
159 bytes for MbMedianP
Intel(R) Pentium(R) M processor 1.70GHz (MMX, SSE, SSE2)
Precision loss is calculated versus the 'real' median, which is obtained by sort
ing
10 numbers generated
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms to find the median of 10 items, which is 30169.25
0 ms for sorting, the real median is 30168.43
Speed gain is a factor 0.145, precision loss is 0.00275%
101 numbers generated
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms to find the median of 101 items, which is 27143.74
0 ms for sorting, the real median is 27147.66
Speed gain is a factor 1.14, precision loss is 0.0145%
1000000 numbers generated
22 ms to find the median of 1 Mio items, which is 26572.96
22 ms to find the median of 1 Mio items, which is 26572.96
22 ms to find the median of 1 Mio items, which is 26572.96
236 ms for sorting, the real median is 26567.51
Speed gain is a factor 10.6, precision loss is 0.0205%
10000001 numbers generated
225 ms to find the median of 10 Mio items, which is 26505.42
223 ms to find the median of 10 Mio items, which is 26505.42
222 ms to find the median of 10 Mio items, which is 26505.42
2545 ms for sorting, the real median is 26510.57
Speed gain is a factor 11.4, precision loss is 0.0194%
100000010 numbers generated
21138 ms to find the median of 100 Mio items, which is 26505.42
2266 ms to find the median of 100 Mio items, which is 26505.42
2240 ms to find the median of 100 Mio items, which is 26505.42
23114 ms for sorting, the real median is 26510.47
Speed gain is a factor 2.70, precision loss is 0.0190%
hit any key
Regards,
Steve

Steve,
You mean this one? Very slow but otherwise it looks ok...
100000010 numbers generated
21138 ms to find the median of 100 Mio items, which is 26505.42
2266 ms to find the median of 100 Mio items, which is 26505.42

Hi Jochen,
Yes, and the system became very sluggish and
somewhat unresponsive. It is a fairly old laptop
though.
Regards,
Steve

That's odd, because after the string "numbers generated" there is no HeapAlloc or similar that could slow down the system. The algo itself does not allocate anything. Is this behaviour reproducible? If it's a single core CPU, it could have been busy with other tasks...

Hi,
That's odd, because after the string "numbers generated" there is no HeapAlloc or similar that could slow down the system. The algo itself does not allocate anything. Is this behaviour reproducible?
Sort of reproducible. Two of around six runs below.
Longest was ~30xxx ms, the shortest ~67xx ms.
100000010 numbers generated
16467 ms to find the median of 100 Mio items, which is 26505.42
2245 ms to find the median of 100 Mio items, which is 26505.42
2233 ms to find the median of 100 Mio items, which is 26505.42
100000010 numbers generated
20118 ms to find the median of 100 Mio items, which is 26505.42
2235 ms to find the median of 100 Mio items, which is 26505.42
2237 ms to find the median of 100 Mio items, which is 26505.42
If it's a single core CPU, it could have been busy with other tasks...
Yes a single core CPU. Sluggishness seems to
persist until rebooted. Somewhere between weird and
strange.
Regards,
Steve

hmmm... the 100 Mio items mean 400MB on the heap. But that's not the algo, it's just the creation of the array. Weird indeed.

> Btw...one small fix on Marinus version (perhaps on yours too it have the same issue).
My routine is not the fault, must be something else. :biggrin:
compare this part with with your conversion:
FindSecondMiddleNumberLP:
inc eax
mov ecx,[edi+eax*4]
test ecx,ecx ; If zero, its not a number.
jz FindSecondMiddleNumberLP
Indeed. I mistakes the masm routines. Now it works like a gem :greensml: :greensml: :greensml:

I stumbled over a set of data for which Marinus' (modified, using DWORDs) algo fails with an error of 3%: the algo returns 37386, but it should be 38532 :cool:
TestData dd 1332, 1575, 3891, 4226, 5629, 6868, 7295, 10283, 10375, 11866
dd 12064, 12231, 12895, 13365, 14384, 14490, 14501, 15995, 17093, 17275
dd 17833, 18070, 19096, 19988, 20168, 20765, 20861, 21012, 21344, 22404
dd 23377, 23947, 24033, 24401, 24709, 25594, 26025, 26058, 26928, 27084
dd 27571, 28237, 28792, 29066, 30189, 30724, 32569, 32765, 33198, 34558
dd 36270, 36987, 37167, 37288, 37573, 39491, 78260, 193983, 312869, 406851
dd 474584, 878091, 933154, 1535982, 1696834, 1826755, 1894374, 1935375, 1979114, 1980304
dd 1995476, 2163480, 2186807, 2364281, 2775589, 3086356, 3163282, 3296671, 3324133, 3368259
dd 3445517, 3506669, 3672249, 3797020, 4185277, 4691675, 5316867, 5403458, 6181758, 6361385
dd 6439150, 6544605, 6547329, 6819265, 6928806, 6963353, 7069587, 7384388, 7574183, 7697938
dd 7856932, 8106433, 8645157, 8771719, 8886262, 9014975, 9177706, 9481844, 9485408, 9488645

My routine is not to blame, must be something else. :cool:

My routine is not to blame, must be something else. :cool:
You do it without normalisation: mov eax, (9488645+1) * 4, and that works :sad:
It's not a blame game, Marinus, I am just trying to understand why my version misbehaves for that specific case. Usually the deviation is in the 0.02% range, here it is 3%, a factor 150 higher...
Most probably it has to with resolution. The range here is 1332...9488645, and the real median is 38532, with is 0.4% of the max value. The distribution looks a bit extreme, see below. If I increase the buffer to 400,000 bytes, all is fine again.
179 bytes for ArrInf1
Intel(R) Core(TM) i52450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs
11 numbers generated
0 ms to find the median of 11 items, which is 31328.00
0 ms to find the median of 11 items, which is 31328.00
0 ms to find the median of 11 items, which is 31328.00
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0245, precision loss is 0.0670%
110 numbers generated
0 ms to find the median of 110 items, which is 38522.00
0 ms to find the median of 110 items, which is 38522.00
0 ms to find the median of 110 items, which is 38522.00
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.457, precision loss is 0.0260%
1000001 numbers generated
5 ms to find the median of 1 Mio items, which is 39751.00
5 ms to find the median of 1 Mio items, which is 39751.00
5 ms to find the median of 1 Mio items, which is 39751.00
100 ms for sorting, the real median is 39792.00
Speed gain is a factor 17.1, precision loss is 0.103%
10000010 numbers generated
60 ms to find the median of 10 Mio items, which is 39825.00
57 ms to find the median of 10 Mio items, which is 39825.00
57 ms to find the median of 10 Mio items, which is 39825.00
1048 ms for sorting, the real median is 39796.00
Speed gain is a factor 17.9, precision loss is 0.0729%
100000101 numbers generated
615 ms to find the median of 100 Mio items, which is 39820.00
587 ms to find the median of 100 Mio items, which is 39820.00
591 ms to find the median of 100 Mio items, which is 39820.00
9960 ms for sorting, the real median is 39796.00
Speed gain is a factor 16.7, precision loss is 0.0603%

That wasn't my intention, probably misunderstood the situation.
Sorry, if I gave you that feeling.
I just allocated the minimum buffer size for the data set.
What do you mean exactly with the normalisation?

No bad feeling here, Marinus :biggrin:
What I am doing here is to adapt the algo to any range of values by normalising it:
aiPop:
if aiInt
fild aiSize$ PTR [esi+aiSize*ecx] ; get original integer
else
fld aiSize$ PTR [esi+aiSize*ecx] ; get original real
endif
fsub arrInf.yMin ; make positive
fmul arrInf.yRange ; normalise 0...99999
fistp arrInf.yIndex ; we need a
mov eax, arrInf.yIndex ; reg32
inc dword ptr [edi+4*eax] ; Populate the Median buffer with the values as index numbers and keep track for duplicates in one go
dec ecx
jns aiPop
That works fine, but (see image above) for very skewed distributions the deviation can be significant.
Results for the latest version (attached):
Intel(R) Core(TM) i52450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs
11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0149, precision loss is 0.0670%
110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.444, precision loss is 0.0260%
1000001 numbers generated
8 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
7 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
8 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
112 ms for sorting, the real median is 39792.00
Speed gain is a factor 13.4, precision loss is 0.103%
10000010 numbers generated
83 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
87 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
84 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
1146 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.5, precision loss is 0.0729%
100000101 numbers generated
848 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
842 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
841 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
10829 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.8, precision loss is 0.0603%

:thumbsup:
Hi Jochen,
Now I understand what your goal is.
What about this idea to speed it up multiple times.
Multithreading.
Divide the data set in equal parts ( / max cpu thread count )
1 Median Buffer for each thread, when done adding the buffers together per index number and find the median, will speed it up.

Well.... at 842 ms to find the median of 100 Mio items, I am not eager to dive into the secrecies of multithreaded programs ;)
Google How do you find the median of a large data set? to see this (https://www.dummies.com/education/math/statistics/howtofindthemedianvalueinastatisticaldataset/):
Order the values from smallest to largest.
If the data set contains an odd number of values , choose the one that is exactly in the middle. You've found the median.
If the data set contains an even number of values , take the two values that appear in the middle and average them to find the median.
So they think sorting is the way to go, which implies that their idea of a "large" data set is different from ours :tongue:

Sometimes, thinking outside of the box can give you a reward.
Most ideas are based on other ideas without understanding why the first idea was constructed that way.
Many ancient fast routines were designed to work for small memory sizes and slow memory reads and writes. Not a problem anymore.
I came up with this idea when I wrote a routine to find the unique number count of colors in a bitmap image.
When guga needed a fast median routine, I knew this would be fast, especially in a small range ( 0255 ) and millions of colors.
Later it turned out to be for only 100 colors per iteration.

This is the goal :cool:
Dim MyArray() As REAL8 ; or DWORD, REAL4, ...
... fill it with values ...
Print Str$("The array has %i elements\n", MyArray(?))
Print Str$("The median is %f (a proxy, typically 0.02% off) \n", MyArray(median?))
Print Str$("The average is %f\n", MyArray(average?))
Print Str$("The minimum is %f\n", MyArray(min?))
Print Str$("The maximum is %f\n", MyArray(max?))

I think multithreading could be a good idea
maybe would be useful for Guga to have extra thread dedicated to filereading/filewriting in background,instead of main thread switches between fast algos running median/image/videoframe processing and to slow write processed images/data to disk when RAM is filled and readin unprocessed images/data ,to start the main loop again

Multithread could be a good way to go, but 1st we need to see if the watermark remover algo will work as expected (i presume it will, but we didn´t get there yet :greensml: :greensml: :greensml:)
Btw....Someone has the timming of quicksort algo ? I would like to compare it with one thing i´m finishing here and will post soon :azn: :azn: :azn:.

In the MASM lib you can find Qsort routines.

Btw....Someone has the timming of quicksort algo ?
No idea what you are talking about (http://masm32.com/board/index.php?topic=3643.msg38433#msg38433) :biggrin:

HI Marinus...I know, but i wanted just to compare the results of qsort and another sorting algo called Bitonic sort. I opened a thread and posted the source (in masm too) to benchmark later.
Btw... How do i exchange the order of 2 floats in SSE from different registers ? Ex:
xmm1 = 1, 2, 3, 4
xmm2 = 5, 6, 7, 8
How do i exchange the order of the low and hi qword on both of them at once in a way that result in:
xmm1 = 1, 2, 5, 6
xmm2 = 3, 4, 6, 7

Btw....Someone has the timming of quicksort algo ?
No idea what you are talking about (http://masm32.com/board/index.php?topic=3643.msg38433#msg38433) :biggrin:
Wow...I didn´t remember that
:biggrin: :biggrin: :biggrin: :biggrin: :biggrin: :bgrin: :bgrin: :bgrin:

No bad feeling here, Marinus :biggrin:
What I am doing here is to adapt the algo to any range of values by normalising it:
aiPop:
if aiInt
fild aiSize$ PTR [esi+aiSize*ecx] ; get original integer
else
fld aiSize$ PTR [esi+aiSize*ecx] ; get original real
endif
fsub arrInf.yMin ; make positive
fmul arrInf.yRange ; normalise 0...99999
fistp arrInf.yIndex ; we need a
mov eax, arrInf.yIndex ; reg32
inc dword ptr [edi+4*eax] ; Populate the Median buffer with the values as index numbers and keep track for duplicates in one go
dec ecx
jns aiPop
That works fine, but (see image above) for very skewed distributions the deviation can be significant.
Results for the latest version (attached):
Intel(R) Core(TM) i52450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs
11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0149, precision loss is 0.0670%
110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.444, precision loss is 0.0260%
1000001 numbers generated
8 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
7 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
8 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
112 ms for sorting, the real median is 39792.00
Speed gain is a factor 13.4, precision loss is 0.103%
10000010 numbers generated
83 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
87 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
84 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
1146 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.5, precision loss is 0.0729%
100000101 numbers generated
848 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
842 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
841 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
10829 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.8, precision loss is 0.0603%
179 bytes for ArrInf1
Intel(R) Core(TM) i74810MQ CPU @ 2.80GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs
11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0166, precision loss is 0.0670%
110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.488, precision loss is 0.0260%
1000001 numbers generated
7 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
6 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
6 ms to find the median of 1 Mio items, which is 39751.00 (min=12313, max=9999999)
91 ms for sorting, the real median is 39792.00
Speed gain is a factor 14.1, precision loss is 0.103%
10000010 numbers generated
70 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
72 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
72 ms to find the median of 10 Mio items, which is 39825.00 (min=12340, max=9999995)
955 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.3, precision loss is 0.0729%
100000101 numbers generated
712 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
678 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
686 ms to find the median of 100 Mio items, which is 39820.00 (min=12345, max=9999999)
8931 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.9, precision loss is 0.0603%
hit any key

HI Marinus...I know, but i wanted just to compare the results of qsort and another sorting algo called Bitonic sort. I opened a thread and posted the source (in masm too) to benchmark later.
Btw... How do i exchange the order of 2 floats in SSE from different registers ? Ex:
xmm1 = 1, 2, 3, 4
xmm2 = 5, 6, 7, 8
How do i exchange the order of the low and hi qword on both of them at once in a way that result in:
xmm1 = 1, 2, 5, 6
xmm2 = 3, 4, 6, 7
With one or more of these beauties,
movlps movhps
shufps pshufd
unpcklps unpckhps
Play with them, once you understand how they work, they will be your best friends forever. :eusa_dance:

JJ. Does the adaptation you did to the algo works for negative limit as well ?
Ex..
fsub arrInf.yMin ; make positive
fmul arrInf.yRange ; normalise 0...99999
If the minimum limit is...say 1 and the max to 1 can this work too if i set those values ?

If the minimum limit is...say 1 and the max to 1 can this work too if i set those values ?
Like this?
include \masm32\MasmBasic\MasmBasic.inc
Init
Dim MyArray() As REAL8 ; can be DWORD, REAL4, REAL8
For_ ecx=0 To 9999999
Rand(1.0, 1.0, MyArray(ecx)) ; put random number into Real8 array
Next
PrintLine Str$("The elements:\t%i", MyArray(?))
PrintLine Str$("The minimum:\t%f", MyArray(?min)v)
PrintLine Str$("The maximum:\t%9f", MyArray(?max)v)
PrintLine Str$("The mean:\t%9f", MyArray(?mean)v)
PrintLine Str$("The median:\t%f", MyArray(?median)v)
EndOfCode
The elements: 10000000
The minimum: 0.9999998
The maximum: 0.999999964
The mean: 1.45892021e05
The median: 6.515711e05

Yes. perfect. Testing it right now. :greenclp: :greenclp:
Tks a lot. :thumbsup: :thumbsup: :thumbsup: :thumbsup: