Author Topic: Timings for fast median algo  (Read 1810 times)

jj2007

  • Member
  • *****
  • Posts: 10861
  • Assembler is fun ;-)
    • MasmBasic
Re: Timings for fast median algo
« Reply #30 on: July 25, 2020, 08:36:58 PM »
My routine is not to blame, must be something else.  :cool:

You do it without normalisation: mov eax, (9488645+1) * 4, and that works :sad:

It's not a blame game, Marinus, I am just trying to understand why my version misbehaves for that specific case. Usually the deviation is in the 0.02% range, here it is 3%, a factor 150 higher...

Most probably it has to with resolution. The range here is 1332...9488645, and the real median is 38532, with is 0.4% of the max value. The distribution looks a bit extreme, see below. If I increase the buffer to 400,000 bytes, all is fine again.
Code: [Select]
179     bytes for ArrInf1
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00
0 ms to find the median of 11 items, which is 31328.00
0 ms to find the median of 11 items, which is 31328.00
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0245, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00
0 ms to find the median of 110 items, which is 38522.00
0 ms to find the median of 110 items, which is 38522.00
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.457, precision loss is -0.0260%

1000001 numbers generated
5 ms to find the median of 1 Mio items, which is 39751.00
5 ms to find the median of 1 Mio items, which is 39751.00
5 ms to find the median of 1 Mio items, which is 39751.00
100 ms for sorting, the real median is 39792.00
Speed gain is a factor 17.1, precision loss is -0.103%

10000010 numbers generated
60 ms to find the median of 10 Mio items, which is 39825.00
57 ms to find the median of 10 Mio items, which is 39825.00
57 ms to find the median of 10 Mio items, which is 39825.00
1048 ms for sorting, the real median is 39796.00
Speed gain is a factor 17.9, precision loss is 0.0729%

100000101 numbers generated
615 ms to find the median of 100 Mio items, which is 39820.00
587 ms to find the median of 100 Mio items, which is 39820.00
591 ms to find the median of 100 Mio items, which is 39820.00
9960 ms for sorting, the real median is 39796.00
Speed gain is a factor 16.7, precision loss is 0.0603%

Siekmanski

  • Member
  • *****
  • Posts: 2453
Re: Timings for fast median algo
« Reply #31 on: July 25, 2020, 09:04:49 PM »
That wasn't my intention, probably misunderstood the situation.
Sorry, if I gave you that feeling.

I just allocated the minimum buffer size for the data set.
What do you mean exactly with the normalisation?
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 10861
  • Assembler is fun ;-)
    • MasmBasic
Re: Timings for fast median algo
« Reply #32 on: July 25, 2020, 09:14:41 PM »
No bad feeling here, Marinus :biggrin:

What I am doing here is to adapt the algo to any range of values by normalising it:
Code: [Select]
aiPop:
  if aiInt
fild aiSize$ PTR [esi+aiSize*ecx] ; get original integer
  else
fld aiSize$ PTR [esi+aiSize*ecx] ; get original real
  endif
  fsub arrInf.yMin ; make positive
  fmul arrInf.yRange ; normalise 0...99999
  fistp arrInf.yIndex ; we need a
  mov eax, arrInf.yIndex ; reg32
  inc dword ptr [edi+4*eax] ; Populate the Median buffer with the values as index numbers and keep track for duplicates in one go
  dec ecx
  jns aiPop

That works fine, but (see image above) for very skewed distributions the deviation can be significant.

Results for the latest version (attached):
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0149, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.444, precision loss is -0.0260%

1000001 numbers generated
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
7 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
112 ms for sorting, the real median is 39792.00
Speed gain is a factor 13.4, precision loss is -0.103%

10000010 numbers generated
83 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
87 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
84 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
1146 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.5, precision loss is 0.0729%

100000101 numbers generated
848 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
842 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
841 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
10829 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.8, precision loss is 0.0603%

Siekmanski

  • Member
  • *****
  • Posts: 2453
Re: Timings for fast median algo
« Reply #33 on: July 25, 2020, 09:20:31 PM »
 :thumbsup:

Hi Jochen,

Now I understand what your goal is.
What about this idea to speed it up multiple times.
Multithreading.
Divide the data set in equal parts ( / max cpu thread count )
1 Median Buffer for each thread, when done adding the buffers together per index number and find the median, will speed it up.
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 10861
  • Assembler is fun ;-)
    • MasmBasic
Re: Timings for fast median algo
« Reply #34 on: July 25, 2020, 09:23:24 PM »
Well.... at 842 ms to find the median of 100 Mio items, I am not eager to dive into the secrecies of multithreaded programs ;-)

Google How do you find the median of a large data set? to see this:
Quote
Order the values from smallest to largest.
If the data set contains an odd number of values , choose the one that is exactly in the middle. You've found the median.
If the data set contains an even number of values , take the two values that appear in the middle and average them to find the median.

So they think sorting is the way to go, which implies that their idea of a "large" data set is different from ours :tongue:

Siekmanski

  • Member
  • *****
  • Posts: 2453
Re: Timings for fast median algo
« Reply #35 on: July 25, 2020, 09:53:46 PM »
Sometimes, thinking outside of the box can give you a reward.
Most ideas are based on other ideas without understanding why the first idea was constructed that way.
Many ancient fast routines were designed to work for small memory sizes and slow memory reads and writes. Not a problem anymore.

I came up with this idea when I wrote a routine to find the unique number count of colors in a bitmap image.
When guga needed a fast median routine, I knew this would be fast, especially in a small range ( 0-255 ) and millions of colors.

Later it turned out to be for only 100 colors per iteration.
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 10861
  • Assembler is fun ;-)
    • MasmBasic
Re: Timings for fast median algo
« Reply #36 on: July 25, 2020, 10:22:42 PM »
This is the goal :cool:
Code: [Select]
Dim MyArray() As REAL8   ; or DWORD, REAL4, ...
... fill it with values ...
Print Str$("The array has %i elements\n", MyArray(?))
Print Str$("The median is %f (a proxy, typically 0.02% off) \n", MyArray(median?))
Print Str$("The average is %f\n", MyArray(average?))
Print Str$("The minimum is %f\n", MyArray(min?))
Print Str$("The maximum is %f\n", MyArray(max?))

daydreamer

  • Member
  • *****
  • Posts: 1422
  • building nextdoor
Re: Timings for fast median algo
« Reply #37 on: July 25, 2020, 11:58:19 PM »
I think multithreading could be a good idea

maybe would be useful for Guga to have extra thread dedicated to filereading/filewriting in background,instead of main thread switches between fast algos running median/image/videoframe processing and to slow write processed images/data to disk when RAM is filled and readin unprocessed images/data ,to start the main loop again
Showcase :
With Masm sdk and 2-3 hours = a windows program :D
Beat that C zealots p:

guga

  • Member
  • *****
  • Posts: 1378
  • Assembly is a state of art.
    • RosAsm
Re: Timings for fast median algo
« Reply #38 on: July 26, 2020, 09:31:47 AM »
Multi-thread could be a good way to go, but 1st we need to see if the watermark remover algo will work as expected (i presume it will, but we didn´t get there yet :greensml: :greensml: :greensml:)

Btw....Someone has the timming of quicksort algo ? I would like to compare it with one thing i´m finishing here and will post soon :azn: :azn: :azn:.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

  • Member
  • *****
  • Posts: 2453
Re: Timings for fast median algo
« Reply #39 on: July 26, 2020, 05:13:22 PM »
In the MASM  lib you can find Qsort routines.
Creative coders use backward thinking techniques as a strategy.

jj2007

  • Member
  • *****
  • Posts: 10861
  • Assembler is fun ;-)
    • MasmBasic
Re: Timings for fast median algo
« Reply #40 on: July 26, 2020, 06:42:52 PM »

guga

  • Member
  • *****
  • Posts: 1378
  • Assembly is a state of art.
    • RosAsm
Re: Timings for fast median algo
« Reply #41 on: July 26, 2020, 06:43:46 PM »
HI Marinus...I know, but i wanted just to compare the results of qsort and another sorting algo called Bitonic sort. I opened a thread and posted the source (in masm too) to benchmark later.

Btw... How do i exchange the order of 2 floats in SSE from different registers ? Ex:

xmm1 = 1, 2, 3, 4
xmm2 = 5, 6, 7, 8

How do i exchange the order of the low and hi qword on both of them at once in a way that result in:

xmm1 = 1, 2, 5, 6
xmm2 = 3, 4, 6, 7
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

  • Member
  • *****
  • Posts: 1378
  • Assembly is a state of art.
    • RosAsm
Re: Timings for fast median algo
« Reply #42 on: July 26, 2020, 06:48:59 PM »
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

LiaoMi

  • Member
  • ****
  • Posts: 713
Re: Timings for fast median algo
« Reply #43 on: July 26, 2020, 07:03:24 PM »
No bad feeling here, Marinus :biggrin:

What I am doing here is to adapt the algo to any range of values by normalising it:
Code: [Select]
aiPop:
  if aiInt
fild aiSize$ PTR [esi+aiSize*ecx] ; get original integer
  else
fld aiSize$ PTR [esi+aiSize*ecx] ; get original real
  endif
  fsub arrInf.yMin ; make positive
  fmul arrInf.yRange ; normalise 0...99999
  fistp arrInf.yIndex ; we need a
  mov eax, arrInf.yIndex ; reg32
  inc dword ptr [edi+4*eax] ; Populate the Median buffer with the values as index numbers and keep track for duplicates in one go
  dec ecx
  jns aiPop

That works fine, but (see image above) for very skewed distributions the deviation can be significant.

Results for the latest version (attached):
Code: [Select]
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0149, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.444, precision loss is -0.0260%

1000001 numbers generated
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
7 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
8 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
112 ms for sorting, the real median is 39792.00
Speed gain is a factor 13.4, precision loss is -0.103%

10000010 numbers generated
83 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
87 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
84 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
1146 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.5, precision loss is 0.0729%

100000101 numbers generated
848 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
842 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
841 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
10829 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.8, precision loss is 0.0603%

Code: [Select]
179     bytes for ArrInf1
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
Precision loss is relative to the 'real' median, which is obtained by sorting the DWORDs

11 numbers generated
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms to find the median of 11 items, which is 31328.00 (min=8260, max=5746454)
0 ms for sorting, the real median is 31349.00
Speed gain is a factor 0.0166, precision loss is -0.0670%

110 numbers generated
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms to find the median of 110 items, which is 38522.00 (min=1332, max=9488645)
0 ms for sorting, the real median is 38532.00
Speed gain is a factor 0.488, precision loss is -0.0260%

1000001 numbers generated
7 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
6 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
6 ms to find the median of 1 Mio items, which is 39751.00 (min=-12313, max=9999999)
91 ms for sorting, the real median is 39792.00
Speed gain is a factor 14.1, precision loss is -0.103%

10000010 numbers generated
70 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
72 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
72 ms to find the median of 10 Mio items, which is 39825.00 (min=-12340, max=9999995)
955 ms for sorting, the real median is 39796.00
Speed gain is a factor 13.3, precision loss is 0.0729%

100000101 numbers generated
712 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
678 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
686 ms to find the median of 100 Mio items, which is 39820.00 (min=-12345, max=9999999)
8931 ms for sorting, the real median is 39796.00
Speed gain is a factor 12.9, precision loss is 0.0603%
hit any key 

Siekmanski

  • Member
  • *****
  • Posts: 2453
Re: Timings for fast median algo
« Reply #44 on: July 26, 2020, 08:05:19 PM »
HI Marinus...I know, but i wanted just to compare the results of qsort and another sorting algo called Bitonic sort. I opened a thread and posted the source (in masm too) to benchmark later.

Btw... How do i exchange the order of 2 floats in SSE from different registers ? Ex:

xmm1 = 1, 2, 3, 4
xmm2 = 5, 6, 7, 8

How do i exchange the order of the low and hi qword on both of them at once in a way that result in:

xmm1 = 1, 2, 5, 6
xmm2 = 3, 4, 6, 7

With one or more of these beauties,

movlps movhps
shufps pshufd
unpcklps unpckhps

Play with them, once you understand how they work, they will be your best friends forever.  :eusa_dance:
Creative coders use backward thinking techniques as a strategy.