Fast median algorithm

nidud · July 19, 2020, 06:53:01 AM

deleted

Siekmanski · July 19, 2020, 07:00:01 AM

EDIT: Spotted an error -> jb changed to jbe
New attachment: see reply#13

guga · July 19, 2020, 08:30:41 AM

Hi SiekManski. Tks you a lot. I´ll give a try and see how it works and if i can make it work also for even values.

Hi JJ. This is for a real routine i´m trying to create for image manipulation. I´m trying to port a routine made by google to assembly, but it is a true hell, since it was originally written in python and i´m not being able to convert it to C and neither test it to check if it is working as expected.

The main algorithm claims to remove/hide watermark on images and it is great not only for this purpose, but specially because it can be later adapted to remove any noise, scratches (and even lens distortions) on any images. If the algorithm works i can try later use it on video as well.

The link of the algorithm in python is here:
https://github.com/rohitrango/automatic-watermark-detection

The algorithm is quite complex, but basically it scans a sequence of 100 images, tries to estimate the watermark location using sobel operator, and later find the median of each one of the images on thee specified pixel locations.

So far i succeeded to port to assembly the 1st steps to the detection routine and ended improving the sobel algo as well. Now it is need to find the median of all pixels on all images and create another image containing only the medians values (for x and y pos). In fact, it creates 2 images, one for sobelX and other for SobelY from which it uses it later to try to identify the alfa values of each one of them (using canny) and later reconstruct the watermark using poisson reconstruction routines.

The algorithm is really amazing and the possibilities of use this are endless (not specific for watermark remover, but for image/video cleaning as well)

I´m not sure if i´ll succeed to port it completely, because when trying to run this stuff in visualstudio, my PC simply freezes all the time after some routines and i have no idea how to properly debug a python file using visualstudio (or even any python editors as well, since i have no idea of what python language syntax do in order to try to reproduce the functions).

About the sobel routine i improved, i suceeded to make the edges more thin and also identify all possible true edges rather then noise. Here is a example of what i suceeded to do so far on sobel algorithm

original image

Sobel Original

Sobel Equalized

Sobel Guga

Sobel Guga Equalized

My sobel version is a small improvement to the original sobel algorithm, because it uses percentage of the pixels to compute their magnitudes rather then simply cutoff frequencies as commonly used in sobel. This small update improves the overall quality of the image and find literally all edges, even when they are not bright as the others. So, it tries to stay proportional to the light of the pixels, rather then cutting of when not needed. Doing that it result in thinner edge lines and also a more equalized image. The proof is that when you equalize both images after passing through sobel, mine version has less artifacts then the equivalent sobel image after also passed to equalization.

My version stays within the limits of 0 to 127 for dark and 128 to 255 to brighter, regardless the values found on the magnitudes of Gx or Gx extrapolates. For example, on the normal sobel, you may find a magnitude of a pixel on a edge whose level is let´s say 300 or 400 and when it find´s it, it simply cut them off to be either black or white (0 or 255). What i did was basically calculate the amount of those extrapolations and adapt them to stay always withing the limits of dark (0 to 127) and bright (128 to 255) without having to cut the extra values off. Those extrapolations btw seems to be associated with the compression of the image that ends to pixelate it. That´s why we can´t simply cut them off and why mine version the edges don´t seems to bend to x and y pos. Each pixel in the sobel corresponds exactly to the luminosity intensity of a pixel on the same position

The only thing i didn´t decided yet is to know what to do with the borders of the image.

guga · July 19, 2020, 08:35:35 AM

Quote from: nidud on July 19, 2020, 06:53:01 AM
Quote from: guga on July 19, 2020, 05:34:13 AM
Ok, i´ll give a try and test the Heapmedian3 to see if it works as expected. However, as you said, it needs an extra buffer in memory which could be overkilling when we need to calculate the median of a larger array then only 27 bytes longs (say a array of floats, ints etc) with 100 Mb for example.

I[´ll try porting it to see if it works and also check for speed.

You may need to save the array for later use so it depends on the situation. In a benchmark test where the function is called repeatably that's at least the case.

Quote
I was trying to port Dansby_kth that is very fast and don´t uses extra buffer, but it is producing a wrong result and i can´t fix that :(

Do you have any idea of what could be possibly wrong ?

It's difficult to say what the right answer is here. If the array is even (1,2,3,4) the answer is 2 or 3 so you need to make a choice there I guess. The returned result seems to be 18 and I assume the correct answer 17. In that case it overshoots the index by one. Reducing the index by one seems at least on this array to give the right answer.

Code Select Expand
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; } float Dansby_kth(float array[], const unsigned int length) { unsigned int kTHvalue = length / 2 - 1; unsigned int left = 0; unsigned int left2 = 0; unsigned int right = length - 1; unsigned int right2 = length - 1; unsigned int kthPlus = kTHvalue; unsigned int kthMinus = kTHvalue; while (right > left) { if( array[right] < array[left] ) F_SWAP(array[left],array[right]); if( array[right] < array[kTHvalue] ) F_SWAP(array[kTHvalue],array[right]); if( array[kTHvalue] < array[left] ) F_SWAP(array[left],array[kTHvalue]); const float temp = array[kTHvalue];//this is pivot kthPlus = kTHvalue + 1; kthMinus = kTHvalue - 1; while ((right2 > kTHvalue) && (left2 < kTHvalue)) { do { left2++; }while (array[left2] < temp); do { right2--; }while (array[right2] > temp); F_SWAP(array[left2],array[right2]); } left2++; right2--; if (right2 < kthMinus) { while (array[left2] < temp) { left2++; } left = left2; right2 = right; kthMinus --; } else if (kthPlus < left2) { while (array[right2] > temp) { right2--; } right = right2; left2 = left; kthPlus ++; } else if( array[left] < array[right] ) { F_SWAP(array[right],array[left]); } } return array[kTHvalue]; }

Hi NiDud. When the number is even you simply compute the average of the 2 located in the middle. Like this:

Array = (1,2,3,4)
You will calculate the median using 2 and 3
Sum them up and divide by 2

So. The media for this is: (2+3)/2 = 2.5

So, you are computing the width / 2 and whatever values are in position (width/2) and (Width/2)+1 you simply calculate the average of those 2.

guga · July 19, 2020, 08:38:33 AM

Quote from: Siekmanski on July 19, 2020, 07:00:01 AM
EDIT: Spotted an error -> jb changed to jbe
New attachment: see reply#13

Thk you a lot, Siekmanski :) I´ll take a look :)

Siekmanski · July 19, 2020, 09:33:01 AM

Here is the routine for even and uneven population sizes.
No sorting, no memory swaps, no divs.

If you want me to explain the algorithm, feel free to ask.

New one: see http://masm32.com/board/index.php?topic=8671.msg94735#msg94735

guga · July 19, 2020, 09:39:18 AM

Wow

Tks you Siekmanski. I´ll take a look right now.

Btw. As i explained above to JJ, the development/improvement of this algorithms are really interesting.

I also succeeded yesterday to create a new algorithm that maybe be used to correctly equalize a histogram. In fact, it already does the job stretching the histogram to 128 pixels correctly balanced using a effect similar to gradients and the others 128 pixels missing could be retrieved back perfectly simply equalizing the whole histogram and filling eventual gaps. The problem is that i got stuck with the math, since i don´t know yet how to do with the values i´m finding. So far, the result is this:

Original

My algo

Siekmanski · July 19, 2020, 09:52:31 AM

I hope it does the job for you. ( and hope it's error free? )

You have to test it before using it, I have done some testing and found no errors......
Just made it tonight in a hurry when I got the idea for the algorithm.
If it works correct, the routine can be made faster.

HSE · July 19, 2020, 10:18:06 AM

Quote from: Siekmanski on July 19, 2020, 09:52:31 AM
( and hope it's error free? )

No, I'm afraid

If the class bigger than half population have more than one element, and one element in excess of half population, then that is the median no matter if population is even.

This is wrong:

Code Select

.data?
MedianBuffer    db 1024 dup (?) ; minimum buffer size must be as large as the biggest number in the data set

Must be:

Code Select

.data
MedianBuffer    db 1024 dup (0) ; minimum buffer size must be as large as the biggest number in the data set

Perhaps:

Code Select

CalculateMedian proc uses ebx esi edi
    
    mov     esi,offset DataSet
    mov     edi,offset MedianBuffer

    mov     ecx,PopulationSize
PopulateLP:
    mov     eax, [esi]
    inc     byte ptr [edi+eax]  
    add     esi,4   
    dec     ecx
    jnz     PopulateLP

    mov     ecx,PopulationSize
    mov     ebx,ecx
    shr     ebx,1
    and     ecx,1                   ; Check even or uneven
    jnz     UnEven
    dec     ebx                     ; Make even
UnEven:

    xor     edx,edx
    mov     eax,-1
FindMedianLP:
    inc     eax
    movzx   ecx,byte ptr[edi+eax]
    add     edx,ecx
    cmp     edx,ebx
    jbe     FindMedianLP

    ;----------------------
    .if edx > ebx          
        mov ecx, edx
        sub ecx, ebx
        .if ecx > 0
            jnz Done
        .endif    
    .endif
    ;-----------------------

    mov     edx,PopulationSize
    and     edx,1                   ; Check even or uneven
    jnz     Done

    mov     edx,eax
FindSecondMiddleNumberLP:
    inc     eax 
    movzx   ecx,byte ptr[edi+eax]
    test    ecx,ecx
    jz      FindSecondMiddleNumberLP
    add     eax,edx
    shr     eax,1
Done:                               ; Result in eax
    ret

CalculateMedian endp

Siekmanski · July 19, 2020, 12:18:07 PM

Hi HSE,
Always thougth Windows initializes the memory in the .rdata section to zero?
Never had a situation in MASM it did not happen.
Maybe other assemblers behave different?

Thank you for finding this bug.

Added these 2 lines:

EDIT: New one: see http://masm32.com/board/index.php?topic=8671.msg94735#msg94735

Code Select

    cmp		ecx,PopulationSize/2
    jb		average

Hope this solves it.
Can you test it?

Code Select

align 4    
CalculateMedian proc uses ebx esi edi
    
    mov     esi,offset DataSet
    mov     edi,offset MedianBuffer

    mov     ecx,PopulationSize
PopulateLP:
    mov     eax,[esi]
    inc     byte ptr [edi+eax]  
    add     esi,4   
    dec     ecx
    jnz     PopulateLP

    mov     ecx,PopulationSize
    mov     ebx,ecx
    shr     ebx,1
    and     ecx,1                   ; Check even or uneven
    jnz     UnEven
    dec     ebx                     ; Make even
UnEven:

    xor     edx,edx
    mov     eax,-1
FindMedianLP:
    inc     eax
    movzx   ecx,byte ptr[edi+eax]
    add     edx,ecx
    cmp     edx,ebx
    jbe     FindMedianLP

    mov     edx,PopulationSize
    and     edx,1                   ; Check even or uneven
    jnz     Done
    mov     edx,eax
FindSecondMiddleNumberLP:
    inc     eax
    movzx   ecx,byte ptr[edi+eax]
    
    cmp     ecx,PopulationSize/2
    jb      Average
    
    test    ecx,ecx
    jz      FindSecondMiddleNumberLP
Average:
    add     eax,edx
    shr     eax,1
Done:                               ; Result in eax
    ret

guga · July 19, 2020, 03:36:43 PM

I´m not sure i understood what is wrong with the code as suggested by HSE. I´m testing it here and didn´t found incorrect values so far. What i´m missing ?

For a tiny table of pixels (256 integer values only from 0 to 255) the algorithm seems to work fine, so far. I ported it to RosAsm trying to make it a bit faster (not measured to see what happened), using local variables so i could try to understand what the algo is doing. . Also it only needs to preserve ecx, since the data is stored in local vars, so no extra push/pop chains at the start and end of the function. And i set the data in the table to dwords, to replace routines like this:

Code Select


(...)
FindMedianLP:
    inc     eax
    movzx   ecx,byte ptr[edi+eax]
    add     edx,ecx
    cmp     edx,ebx
    jbe     FindMedianLP

To something like:

Code Select


(....)
MedianBuffer    dd 256 dup (0)
(...)

FindMedianLP:
    inc     eax
    add     ecx [edi+eax] ; or add  ecx [MedianBuffer+eax] to avoid using more registers.
    cmp     edx,ebx
    jbe     FindMedianLP

Can you explain to me how does it is working ? I´m not fully understood why.

And also, does it needs to use a table ? This is because even if the algo can be improved for speed, the table should be zeroed whenever the function is called and if we are dealing with huge amount of data like pixels in videos or in my case, hundreds of images to be analysed), then, cleaning the table could slow down the algo a little bit.

And another question,. how to make it work with SSE on Float or Real Numbers ? I mean, the algo is working on a array of integers (gray maps), but it won´t work on another table that is a array of floats to used to compute the median of Sobel data.

I ported like this:

Code Select



[MedianPopulateTable: D$ 0 #256] ; minimum buffer size must be as large as the biggest number in the data set. In my case it is a range of pixels so, 256 values only. In this case, i´m using the table as a 256 dword (D$ in RosAsm syntax) array

Proc CalculateMedian3:
    Arguments @Input, @ArraySize
    Local @IsArraySizeOdd, @MiddleTablePos, @MedianLeftVal, @iCounter
    Uses ecx

    mov ecx D@ArraySize | mov D@iCounter ecx ;  "ArraySize" is the same as PopulationSize. I just renamed it as array so i could better follow what it is doing in the other routines i´m testing it with.
    mov ecx D@Input
    mov D@IsArraySizeOdd &TRUE

    ; Populate Buffer Table
@PopulateLP:
        mov eax D$ecx
        inc D$MedianPopulateTable+eax*4
        add ecx 4
    dec D@iCounter | jne @PopulateLP

    mov ecx D@ArraySize | mov D@MiddleTablePos ecx | shr D@MiddleTablePos 1
    Test_If cl cl ; Is number even ? Set IsArraySizeOdd to FALSE and decrease the position of the array table to be compared with
        dec D@MiddleTablePos
        mov D@IsArraySizeOdd &FALSE ;  set Flag to not odd number (So, it's a even)
    Test_End

    ; FindMedianLP
    xor ecx ecx
    mov eax 0-1
    Do
        inc eax
        add ecx D$MedianPopulateTable+eax*4
    Loop_Until ecx > D@MiddleTablePos

    If D@IsArraySizeOdd = &FALSE ; is Array Size even ?
        ; FindSecondMiddleNumberLP
        mov D@MedianLeftVal eax
        Do
            inc eax
        Loop_Until D$MedianPopulateTable+eax*4 <> 0
        add eax D@MedianLeftVal
        shr eax 1
    End_If

    ; Result in eax

EndP

jj2007 · July 19, 2020, 05:55:47 PM

Quote from: guga on July 19, 2020, 08:30:41 AMNow it is need to find the median of all pixels on all images and create another image containing only the medians values (for x and y pos)

- If we are talking about 32-bit colour images, does it mean you need the routine at byte size level?
- Have you thought of using the arithmetic medium instead? It's definitely much faster, and in practice there might be little difference.

Below an illustration of the difference between median and average. The only little problem is that your data are not sorted

daydreamer · July 19, 2020, 07:46:10 PM

Quote from: guga on July 19, 2020, 03:36:43 PM
I´m not sure i understood what is wrong with the code as suggested by HSE. I´m testing it here and didn´t found incorrect values so far. What i´m missing ?

For a tiny table of pixels (256 integer values only from 0 to 255) the algorithm seems to work fine, so far. I ported it to RosAsm trying to make it a bit faster (not measured to see what happened), using local variables so i could try to understand what the algo is doing. . Also it only needs to preserve ecx, since the data is stored in local vars, so no extra push/pop chains at the start and end of the function. And i set the data in the table to dwords, to replace routines like this:

Code Select Expand
(...) FindMedianLP: inc eax movzx ecx,byte ptr[edi+eax] add edx,ecx cmp edx,ebx jbe FindMedianLP

To something like:
Code Select Expand
(....) MedianBuffer dd 256 dup (0) (...) FindMedianLP: inc eax add ecx [edi+eax] ; or add ecx [MedianBuffer+eax] to avoid using more registers. cmp edx,ebx jbe FindMedianLP

Can you explain to me how does it is working ? I´m not fully understood why.

And also, does it needs to use a table ? This is because even if the algo can be improved for speed, the table should byte zeroed whenever the function is called and if we are dealing with huge amount of data like pixels in videos or in my case, hundreds of images to be analysed), then, cleaning the table could slow down the algo a little bit.

And another question,. how to make it work with SSE on Float or Real Numbers ? I mean, the algo is working on a array of integers (gray maps), but it won´t work on another table that is a array of floats to used to compute the median of Sobel data.

maybe you should have a big dynamic alloc memory buffer already cleared,so that simple move on to next 256 bytes,next time its called?,your going to use dynamic alloc anyway to process big image data
thats what I mean with SSE MAXPS and MINPS is:
if arraya>arrayb swap(arraya,arrayb) :

arrayb=MAXSS(arraya,arrayb)
arraya=MINSS(arraya,arrayb)
so unroll it MAXPS and MINPS does 4 of these conditional swaps in parallel, which could help with big image data,and MAXSS and MINSS to sort the final few numbers

HSE · July 20, 2020, 04:28:29 AM

Hi Siekmanski!

I have forgotten to say: very smart idea!

Quote from: Siekmanski on July 19, 2020, 12:18:07 PM
Maybe other assemblers behave different?

Could be that!

Now algo is failling. Something strange:

Code Select

   movzx   ecx,byte ptr[edi+eax]
   
    cmp     ecx,PopulationSize/2    <--- ????
    jb      Average

In previous version I dont write very well, must be (I think):

Code Select

    .if edx > ebx          
        mov ecx, edx
        sub ecx, ebx
        cmp ecx , 0
        jg Done
    .endif

guga · July 20, 2020, 04:29:40 AM

Quote from: jj2007 on July 19, 2020, 05:55:47 PM
Quote from: guga on July 19, 2020, 08:30:41 AMNow it is need to find the median of all pixels on all images and create another image containing only the medians values (for x and y pos)

- If we are talking about 32-bit colour images, does it mean you need the routine at byte size level?
- Have you thought of using the arithmetic medium instead? It's definitely much faster, and in practice there might be little difference.

Below an illustration of the difference between median and average. The only little problem is that your data are not sorted

Hi JJ. Yes they are 32 bit color images. About the routines being in byte level, it depends of what part of the routine is being used, i presume. To gain a bit more speed, i converted the images to gray, and insert the data on a huge table to be manipulated later.
even knowing that each gray image has only one color, instead using a table of bytes, i simply convert them to Dwords to gain speed in other parts of the routines, to avoid having to use movzx eax B$ecx+....... So a simple mov eax D$xxx is enough to eliminate the needs of using extra code.

For example, to estimate the watermark, all images to be calculated are converted previously to gray and inserted on a huge table to hold all of the data. So, the 1st thing to do is:

1 - Allocate enough memory to hold all images to be processed. So far it uses 4 Dword Tables per image. One for greymap, ohher for Sobelx, other for SobelY and other for Sobel. The size of each table that stores the pixel data is calculate as: Width*height*4*4. Where 4 is because now seems to be needed them in a Dword format and not byte), and the other 4 because we are using 4 pixel tables (or maps)

The values are stored on a simple Structure followed by a array of structures to manip7ulate the data to be used.

Code Select


[ImgDataHeader:
 ImgDataHeader.Size: D$ 0 ; Total size of our structure with the arrays included
 ImgDataHeader.Count: D$ 0 ; total amount of valid data structures. So, the total of images to be processed that fits to a inputed size. Ex: Count only the images on a directory that have resolution of  960*720
 ImgDataHeader.DataSize: D$ 0 ; Size of each data chunk. Width*Height*4
 ImgDataHeader.MaskWidth: D$ 0 ; Width of the image that will be later converted to a mask
 ImgDataHeader.MaskHeight: D$ 0] ; Width of the image that will be later converted to a mask

Immediately followed by a array of N "FullImageData" structures that are only pointers to the processed data. N is given by  "ImgDataHeader.Count"

[FullImageData:
 FullImageData.pGrayMap: D$ 0 ; Pointer to graymap
 FullImageData.pSobelX: D$ 0 ; Pointer to SobelXMap
 FullImageData.pSobelY: D$ 0 ; Pointer to SobelYMap
 FullImageData.pSobel: D$ 0] ; Pointer to SobelMap
(....)

 FullImageData.pGrayMap100: D$ 0 ; Pointer to graymap of the 100th image. Ex: Points to GreymapImage1 (The 1st greymap)
 FullImageData.pSobelX100: D$ 0 ; Pointer to SobelXMap of the 100th image Ex: Points to SobelXMap1 (The 1st SobelXMap1)
 FullImageData.pSobelY100: D$ 0 ; Pointer to SobelYMap of the 100th image Ex: Points to SobelYMap1 (The 1st SobelYMap1)
 FullImageData.pSobel100: D$ 0] ; Pointer to SobelMap of the 100th image Ex: Points to SobelMap1 (The 1st SobelMap1)
]

Immediatelly followed by the maps, Starting with the greymap
[GreymapImage1: D$ 0#(960*720*4)
(...)
GreymapImage100: D$ 0#(960*720*4)
(...)
[SobelXMap1: D$ 0#(960*720*4)
 SobelYMap1: D$ 0#(960*720*4)
 SobelMap1: D$ 0#(960*720*4)
(...)
SobelXMap100: D$ 0#(960*720*4)
 SobelYMap100: D$ 0#(960*720*4)
 SobelMap100: D$ 0#(960*720*4)

2 - Estimate4 the median value of Sobel

3 - Didn´t reached yet the next parts of the documentation to understand what nees to be done. The next part according to the python version is read the inputed image from whom you wish to remove the watermark, apply a poisson algorithm on the images (don´t know how to do yet, because i didn´t understood what the python version is actually doing.)

All the method is described here:
https://watermark-cvpr17.github.io/supplemental/
https://watermark-cvpr17.github.io/
https://ai.googleblog.com/2017/08/making-visible-watermarks-more-effective.html

And the link to the python version in github i posted.

If you want to read the google documentation that describes the algorithm it can be find here:
http://openaccess.thecvf.com/content_cvpr_2017/papers/Dekel_On_the_Effectiveness_CVPR_2017_paper.pdf

About using average rather the median, well...i agree that it cold improve speed, but i don´t know what the results could be, because the original google article it says to use median to retrieve better results.

The MASM Forum

News:

Fast median algorithm

nidud

Siekmanski

guga

guga

guga

Siekmanski

guga

Siekmanski

HSE

Siekmanski

guga

jj2007

daydreamer

HSE

guga