Print Page - Duplicated values removal in array of floats

Title: Duplicated values removal in array of floats
Post by: guga on June 02, 2024, 07:21:09 AM

Using SSE2, someone has a faster way to remove duplications on a array of Real4 ? Btw...one for sorted and other for unsorted arrays. Pls

Title: Re: Duplicated values removal in array of floats
Post by: raymond on June 03, 2024, 02:37:40 AM

I have no idea about your intended end result.

Removing duplicates from an array of REAL4s is no different than removing duplicates from an array of 32-bit integers. The use of SSE2 instructions to provide any speed advantage may thus be questionable from that point of view. However, I have no expertise with SSE2 and someone else may want to inject some light on that subject.

Removing duplicates from arrays of REAL8s in 32-bit mode would be a different beast to be treated differently, while it could be done similarly in 64-bit mode.

Removing duplicates from arrays of REAL10s becomes "two bits" more tricky using integers! :dazzled:

Title: Re: Duplicated values removal in array of floats
Post by: guga on June 03, 2024, 04:38:51 AM

Hi Raymond. I created a routine to remove duplicated values from a array of floats without using SSE2 (Same as as regular Dwords, as you said). But i was thinking on a way to make it faster working with 4 dwords at once. This is for my dbscan algorithm i'm making. Since we are dealing with images, the final array of data (pixels) can be huge.

Title: Re: Duplicated values removal in array of floats
Post by: daydreamer on June 03, 2024, 05:13:06 AM

Similar approach as count unique colours ?
24 bit color data used as pointer to color array ?
Many of pixels same colours = many writes to same array element ?
For example 0,0,ff pixel color array [255] +Set bit flag that color exists

Title: Re: Duplicated values removal in array of floats
Post by: guga on June 03, 2024, 06:19:30 AM

Hi Daydreamer. The pixels are in ARGB or BGRA format, therefore 32 bits. They are loaded using Gdi+

The part of the code i made to remove the duplications is like this:

Code Select

    ; 2 - remove the duplications (And make delta absolute)
    xorps xmm2 xmm2
    xor ecx ecx
    xor edx edx
    .Do
        movss xmm0 X$esi+edx*4 ; esi = Pointer to the array of Floats
        mov eax D$esi+edx*4
        Do
            inc edx
            mov ebx D$esi+edx*4
            On ebx = 0, jmp L1> ; out the loop if zero is found
            On edx => D@num_points, jmp L1> ; out the loop if the counter reaches the number of points
        Loop_Until eax <> ebx
        mov D$esi+ecx*4 eax

        movss xmm1 X$esi+edx*4 ; prepare to get the maximum value of the delta inside this same routine
        subss xmm0 xmm1
        SSE_ABS_REAL4 xmm0 ; and convert them to absolute
        maxss xmm2 xmm0 ; get the maximum distance between 2 points

        inc ecx
    .Loop_Until edx >= D@num_points

L1:

What this part of the code do is while remove the repetitive values, it calculate the maximum delta between one and another (the ones that are unique) to be used in another routine. Here, the array is already ordered

Btw...the content of the array in esi is no longer the pixels, but the pixels density. So, a Real4 value representing the delta between their neighbors. This part i already calculated and placed on this array. The problem is that it contains thousands of duplications that needs to be removed. Also, on the same routine i calculated the maximum delta between the unique values on this array to use in another part of the code.

Baisally it does this:

Array: 30.0 50.0 50.0 50.0 50.0 51.6 99.9 .......

remove the duplications and also calculate the delta at once

Array Result: 30.0 50.0 51.6 99.9 .......

DeltaMax = (abs(30-50)), (abs(50-51)), abs(51-51.6), abs (51.6-99.9) and so on. = The maximum value of the deltas on the unique values of the array.

After the main loop, ecx contains the total amount of unique values of the array. No need to create another memory buffer to store the new array, since i´m using esi as input and output.

SSE_ABS_REAL4 is a macro like this:

[SSE_ABS_REAL4 | pslld #1 1 | psrld #1 1]
and the other macro for the loop is:
[Do | K0:]
[Loop_Until | #=3 | cmp #1 #3 | jn#2 K0<] . unfold as: cmp eax ebx | je K0<

The problem is that the 1st loop is slow. I tried to extend it to find the next 4 or even 8 dwords after, but i failed to make it get back to the correct pointer.
like this:
xmm0 = 4 floats
xmm1 = next 4 floats
compare xmm0 with xmm1, if equal jump over 8 floats and back to the loop and so on. And this is where i was failing, because i couldn´t make it go back to the proper pointer to do the comparison all over again.

This part of the code is responsible to get the delta of the density of each pixel that was preordered with quicksort, but it contains thousands of duplications. The performance already slowed down due to the usage of quicksort, now with this duplication removal it get a bit worst in terms of speed. That´s why i´m trying a faster routine in at least one of those two main parts that can slow down the code.

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 03, 2024, 07:15:21 AM

Quote from: guga on June 03, 2024, 06:19:30 AMThe pixels are in ARGB or BGRA format, therefore 32 bits.

Therefore they're not floats; they're really 32-bit structures.
Not that it matters; you're just trying to get rid of any duplicate 32-bit things, whatever they may be.
Wish I could help you with the SSE2 stuff.

Title: Re: Duplicated values removal in array of floats
Post by: guga on June 03, 2024, 08:44:30 AM

Not necessarily. The source pixel values are no longer used when the algorithm starts. The resulting value is the delta of each pixel in each channel, calculated by the distance between neighboring pixels. In fact, before the algorithm starts, it converts the image to grayscale. Converting to gray don´t affect the result....at least, not in the tests I'm doing, since to speed things up, I'm converting to grayscale as mentioned in the articles I read. And if it works, it might be possible to work with the channels separately if that changes anything when working with color images. But I don't think it will change much, since, in one way or another, when we convert to gray, we are working with Luma values that are invariable, that is, they do not change the aspect (texture, grain, Hue, saturation etc etc etc) of the image.

What dbscan does is calculate the difference (delta) between each pixel and its neighbors, regardless of the colorspace used (RGB, CieLCH, HSL, etc.). Of course, any resulting value for this delta will not exceed the limit of one pixel (255 or equivalent if not using RGB as input), but we no longer work directly with the pixels of the image, neither their structures.

The entire delta is placed in an array (of Floats).

QuoteWish I could help you with the SSE2 stuff.

Tks :) ´ll give another try later. It is possible, but i spent the last 2 or 3 days trying to improve this and nothing yet. So, i´ll test later and check the results.

Title: Re: Duplicated values removal in array of floats
Post by: daydreamer on June 03, 2024, 03:36:01 PM

Wouldn't SSE abs 4 values at a time followed by maxps xmm0,xmm2 speed up ?
Unroll the above few times ?
Unroll inner loop and align 64 before loop ?

If you compare with packed compares equals
50.0,50.0 ,you get a mask ,use mask for add EDI, by 1 element or not
Copy to new array,if two copies are found,increment not

Title: Re: Duplicated values removal in array of floats
Post by: jj2007 on June 03, 2024, 09:16:41 PM

Quote from: guga on June 03, 2024, 04:38:51 AMmake it faster working with 4 dwords at once

Have a look at PCMPEQD and PCMPESTRI.

Title: Re: Duplicated values removal in array of floats
Post by: HSE on June 04, 2024, 05:03:33 AM

Hi Guga,

Quote from: guga on June 02, 2024, 07:21:09 AMsomeone has a faster way to remove duplications on a array of Real4

Perhaps faster ways is to prevent addition of duplicated values in the array.

Regards, HSE.

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 05:18:42 AM

Quote from: HSE on June 04, 2024, 05:03:33 AMPerhaps faster ways is to prevent addition of duplicated values in the array.

But but but ... wouldn't that require a (linear, binary) search every time you add a value? Seems like that would be the slowest of all, compared to just going through the completed array one time.

Title: Re: Duplicated values removal in array of floats
Post by: daydreamer on June 04, 2024, 05:40:54 AM

Quote from: HSE on June 04, 2024, 05:03:33 AMHi Guga,

Quote from: guga on June 02, 2024, 07:21:09 AMsomeone has a faster way to remove duplications on a array of Real4

Perhaps faster ways is to prevent addition of duplicated values in the array.

Regards, HSE.

That's possible with sse2 packed compare instruction that creates masks 0 or 0ffffffffh
Movd xmm6,array-1
Movd xmm7,array
Pmpeq xmm6,xmm7
Movd mask,xmm6
Movd xmm7,array
Pand xmm7,mask
Movd xmm0,sum
Paddd xmm0,xmm7 ; add zero if duplicate ,add delta if none duplicate

Title: Re: Duplicated values removal in array of floats
Post by: HSE on June 04, 2024, 07:58:43 AM

Quote from: NoCforMe on June 04, 2024, 05:18:42 AMcompared to just going through the completed array one time.

Think :biggrin:

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 08:31:13 AM

I did think; that's why I posted that.

OK, as you build the list, the first versions will be very short, but getting longer each time, right? 1, 2, 3, ... n.
So if you do things your way you'll be dealing with a total of

n(n+1)∕2

potential comparisons, instead of just

.
(Of course if you use anything besides a dumbass linear search you can reduce that number, but never as low as n.)

Or am I missing something here?

Title: Re: Duplicated values removal in array of floats
Post by: jj2007 on June 04, 2024, 08:41:44 AM

Quote from: daydreamer on June 04, 2024, 05:40:54 AMPmpeq xmm6,xmm7

You probably mean pcmpeqd.

Quote from: jj2007 on June 03, 2024, 09:16:41 PMHave a look at PCMPEQD.

Title: Re: Duplicated values removal in array of floats
Post by: HSE on June 04, 2024, 08:48:16 AM

Quote from: NoCforMe on June 04, 2024, 08:31:13 AMOr am I missing something here?

What is n?

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 09:00:45 AM

Quote from: HSE on June 04, 2024, 08:48:16 AMWhat is n?

The number of items in the list (or array if you prefer).

Title: Re: Duplicated values removal in array of floats
Post by: HSE on June 04, 2024, 09:03:08 AM

Quote from: NoCforMe on June 04, 2024, 09:00:45 AMThe number of items in the list (or array if you prefer).

:biggrin: When?

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 09:22:41 AM

As you add items (you start off with zero, of course:

1, 2, 3, ... n

This is all obvious, right? What are you getting at here?

Title: Re: Duplicated values removal in array of floats
Post by: HSE on June 04, 2024, 09:37:37 AM

Quote from: NoCforMe on June 04, 2024, 09:22:41 AMWhat are you getting at here?

Apparently you are thinking that all pixels are differents. Anyway that is worst case. But you don't know.

n is the number of pixels. Testing at the end you have n*(n-1) comparisons (in worst case) more something to exclude autocomparison or use later, or you have to rearrange the array when pixel value is a duplicate.

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 09:54:45 AM

Yes, that's the worst-case assumption.

So it depends on how many duplicates there actually are (which of course you have no idea when you're running). If there are few duplicates, then your suggestion (removing duplicates as you go rather than at the end) would probably be better.

It's a race between doing (potentially) n(n-1) comparisons (worst case) at the end, vs. potentially many more comparisons (1+2+3+4+ ... +n) if you do it your way, as you go.

I don't know. It seems like it would take more time your way, but now I'm not sure. (No expert in combinatorial math here.)

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 12:23:54 PM

OK, let's try this: I'm not a mathematician, but I'm going to play one on TV for just a minute.

To formalize this, I believe (correct me if I'm wrong) that the potential total number of operations (O₁) to do things your way--eliminating duplicates as they're added to the array--is this:
O1.gif

while the total number of operations (O₂) if you wait until the end to remove duplicates is
O2.gif
Seems like the first number would be greater.

Now I'm going to sit back and wait for others who know more to shoot holes in my theory ...

Title: Re: Duplicated values removal in array of floats
Post by: HSE on June 04, 2024, 12:53:05 PM

At the end probability always is worst:

O2 > O1

n (m-1) > n (m+1) /2

Where
n >= m
n: number of values
m: number of unique values

Title: Re: Duplicated values removal in array of floats
Post by: NoCforMe on June 04, 2024, 01:09:42 PM

OK, I'll have to take your word for that.

BTW, I may have screwed up O₁, which I guess is just

  n
{sigma} n
  0

(using a linear search to see if the value being added is already there)

Title: Re: Duplicated values removal in array of floats
Post by: guga on June 09, 2024, 12:50:22 AM

Tks a lot guys. I´ll take a look, I´m just finishing the adjustments of the DbScan filter i described here https://masm32.com/board/index.php?topic=11919.30 and later i´ll give a test on the removal duplications problem.

The MASM Forum

General => The Laboratory => Topic started by: guga on June 02, 2024, 07:21:09 AM